chapter 5: concurrency patterns

317

Chapter 5: Concurrency Patterns Overview

"You look at where you're going and where you are and it never makes sense, but then you look back at where you've been and a pattern seems to

emerge. And if you project forward from that pattern, then sometimes you can come up with something."

Robert M. Pirsig

This chapter presents five patterns that address various types of concurrency architecture and design issues for components, subsystems, and applications: Active Object, Monitor Object, Half-Sync/Half-Async, Leader/Followers, and Thread-Specific Storage.

The choice of concurrency architecture has a significant impact on the design and performance of multi-threaded networking middleware and applications. No single concurrency architecture is suitable for all workload conditions and hardware and software platforms. The patterns in this chapter therefore collectively provide solutions to a variety of concurrency problems.

The first two patterns in this chapter specify designs for sharing resources among multiple threads or processes: § The Active Object design pattern (369)decouples method execution from method

invocation. Its purpose is to enhance concurrency and simplify synchronized access to objects that reside in their own threads of control

§ The Monitor Object design pattern (399) synchronizes concurrent method execution to ensure that only one method at a time runs within an object. It also allows an object's methods to schedule their execution sequences cooperatively.

Both patterns can synchronize and schedule methods invoked concurrently on objects. The main difference is that an active object executes its methods in a different thread than its clients, whereas a monitor object executes its methods by borrowing the thread of its clients. As a result active objects can perform more sophisticated—albeit expensive—scheduling to determine the order in which their methods execute.

The next two patterns in this chapter define higher-level concurrency architectures: § The Half-Sync/Half-Async architectural pattern (423) decouples asynchronous and

synchronous processing in concurrent systems, to simplify programming without reducing performance undudly. The pattern introduces two intercommunicating layers, one for asynchronous and one for synchronous service processing. A further queuing layer mediates communication between services in the asynchronous and synchronous layers.

§ The Leader/Followers architectural pattern (447) provides an efficient concurrency model where multiple threads take turns to share a set of event sources to detect, demultiplex, dispatch, and process service requests that occur on the event sources. The Leader/Followers pattern can be used in lieu of the Half-Sync/Half-Async and Active Object patterns to improve performance when there are no synchronization or ordering constraints on the processing of requests by pooled threads.

Implementors of the Half-Sync/Half-Async and Leader/Followers patterns can use the Active Object and Monitor Object patterns to coordinate access to shared objects efficiently.

The final pattern in this chapter offers a different strategy for addressing certain inherent complexities of concurrency:

318

§ The Thread-Specific Storage design pattern (475) allows multiple threads to use one 'logically global' access point to retrieve an object that is local to a thread, without incurring locking overhead on each access to the object. To some extent this pattern can be viewed as the 'antithesis' of the other patterns in this section, because it addresses several inherent complexities of concurrency by preventing the sharing of resources among threads.

Implementations of all patterns in this chapter can use the patterns from Chapter 4, Synchronization Patterns, to protect critical regions from concurrent access.

Other patterns in the literature that address concurrency-related issues include Master-Slave [POSA1], Producer-Consumer [Grand98], Scheduler [Lea99a], and Two-phase Termination [Grand98].

Active Object The Active Object design pattern decouples method execution from method invocation to enhance concurrency and simplify synchronized access to objects that reside in their own threads of control.

Also Known As

Concurrent Object

Example

Consider the design of a communication gateway,[1] which decouples cooperating components and allows them to interact without having direct dependencies on each other. As shown below, the gateway may route messages from one or more supplier processes to one or more consumer processes in a distributed system.

The suppliers, consumers, and gateway communicate using TCP [Ste93], which is a connection-oriented protocol. The gateway may therefore encounter flow control from the TCP transport layer when it tries to send data to a remote consumer. TCP uses flow control to ensure that fast suppliers or gateways do not produce data more rapidly than slow consumers or congested networks can buffer and process the data. To improve end-to-end quality of service (QoS) for all suppliers and consumers, the entire gateway process must not block while waiting for flow control to abate over any one connection to a consumer. In addition the gateway must scale up efficiently as the number of suppliers and consumers increase.

An effective way to prevent blocking and improve performance is to introduce concurrency into the gateway design, for example by associating a different thread of control for each TCP connection. This design enables threads whose TCP connections are flow controlled to block without impeding the progress of threads whose connections are not flow controlled.

319

We thus need to determine how to program the gateway threads and how these threads interact with supplier and consumer handlers.

Context

Clients that access objects running in separate threads of control.

Problem

Many applications benefit from using concurrent objects to improve their quality of service, for example by allowing an application to handle multiple client requests simultaneously. Instead of using a single-threaded passive object, which executes its methods in the thread of control of the client that invoked the methods, a concurrent object resides in its own thread of control. If objects run concurrently, however, we must synchronize access to their methods and data if these objects are shared and modified by multiple client threads, in which case three forces arise: § Processing-intensive methods invoked on an object concurrently should not block the

entire process indefinitely, thereby degrading the quality of service of other concurrent objects.

§ For example, if one outgoing TCP connection in our gateway example is blocked due to flow control, the gateway process still should be able to run other threads that can queue new messages while waiting for flow control to abate. Similarly, if other outgoing TCP connections are not flow controlled, it should be possible for other threads in the gateway to send messages to their consumers independently of any blocked connections.

§ Synchronized access to shared objects should be straightforward to program. In particular, client method invocations on a shared object that are subject to synchronization constraints should be serialized and scheduled transparently.

§ Applications like our gateway can be hard to program if developers use low-level synchronization mechanisms, such as acquiring and releasing mutual exclusion (mutex) locks explicitly. Methods that are subject to synchronization constraints, such as enqueueing and dequeueing messages from TCP connections, should be serialized transparently when objects are accessed by multiple threads.

§ Applications should be designed to leverage the parallelism available on a hardware/software platform transparently.

§ In our gateway example, messages destined for different consumers should be sent concurrently by a gateway over different TCP connections. If the entire gateway is programmed to only run in a single thread of control, however, performance bottlenecks cannot be alleviated transparently by running the gateway on a multi-processor platform.

320

Solution

For each object exposed to the forces above, decouple method invocation on the object from method execution. Method invocation should occur in the client's thread of control, whereas method execution should occur in a separate thread. Moreover, design the decoupling so the client thread appears to invoke an ordinary method.

In detail: A proxy [POSA1] [GoF95] represents the interface of an active object and a servant [OMG98a] provides the active object's implementation. Both the proxy and the servant run in separate threads so that method invocations and method executions can run concurrently. The proxy runs in the client thread, while the servant runs in a different thread.

At run-time the proxy transforms the client's method invocations into method requests, which are stored in an activation list by a scheduler. The scheduler's event loop runs continuously in the same thread as the servant, dequeueing method requests from the activation list and dispatching them on the servant. Clients can obtain the result of a method's execution via a future returned by the proxy.

Structure

An active object consists of six components:

A proxy [POSA1] [GoF95] provides an interface that allows clients to invoke publicly-accessible methods on an active object. The use of a proxy permits applications to program using standard strongly-typed language features, rather than passing loosely-typed messages between threads. The proxy resides in the client's thread.

When a client invokes a method defined by the proxy it triggers the construction of a method request object. A method request contains the context information, such as a method's parameters, necessary to execute a specific method invocation and return any result to the client. A method request class defines an interface for executing the methods of an active object. This interface also contains guard methods that can be used to determine when a method request can be executed. For every public method offered by a proxy that requires synchronized access in the active object, the method request class is subclassed to create a concrete method request class.

321

A proxy inserts the concrete method request it creates into an activation list. This list maintains a bounded buffer of pending method requests created by the proxy and keeps track of which method requests can execute. The activation list decouples the client thread where the proxy resides from the thread where the servant method is executed, so the two threads can run concurrently. The internal state of the activation list must therefore be serialized to protect it from concurrent access.

A scheduler runs in a different thread than its client proxies, namely in the active object's thread. It decides which method request to execute next on an active object. This scheduling decision is based on various criteria, such as ordering—the order in which methods are called on the active object—or certain properties of an active object, such as its state. A scheduler can evaluate these properties using the method requests' guards, which determine when it is possible to execute the method request [Lea99a]. A scheduler uses an activation list to manage method requests that are pending execution. Method requests are inserted in an activation list by a proxy when clients invoke one of its methods.

A servant defines the behavior and state that is modeled as an active object. The methods a servant implements correspond to the interface of the proxy and method requests the proxy creates. It may also contain other predicate methods that method requests can use to implement their guards. A servant method is invoked when its associated method request is executed by a scheduler. Thus, it executes in its scheduler's thread.

When a client invokes a method on a proxy it receives a future [Hal85] [LS88]. This future allows the client to obtain the result of the method invocation after the servant finishes executing the method. Each future reserves space for the invoked method to store its result. When a client wants to obtain this result, it can rendezvous with the future, either blocking or polling until the result is computed and stored into the future.

The class diagram for the Active Object pattern is shown below:

322

Dynamic

The behavior of the Active Object pattern can be divided into three phases: § Method request construction and scheduling. A client invokes a method on the proxy.

This triggers the creation of a method request, which maintains the argument bindings to the method as well as any other bindings required to execute the method and return its result. The proxy then passes the method request to its scheduler, which enqueues it on the activation list. If the method is defined as a two-way invocation [OMG98a], a future is returned to the client. No future is returned if a method is a one-way, which means it has no return values.

§ Method request execution. The active object's scheduler runs continuously in a different thread than its clients. The scheduler monitors its activation list and determines which method request(s) have become runnable by calling their guard method. When a method request becomes runnable the scheduler removes it, binds the request to its servant, and dispatches the appropriate method on the servant. When this method is called, it can access and update the state of its servant and create its result if it is a two-way method invocation.

§ Completion. In this phase the result, if any, is stored in the future and the active object's scheduler returns to monitor the activation list for runnable method requests. After a two-way method completes, clients can retrieve its result via the future. In general, any clients that rendezvous with the future can obtain its result. The method request and future can be deleted explicitly or garbage collected when they are no longer referenced.

323

Implementation

Five activities show how to implement the Active Object pattern. 1. Implement the servant. A servant defines the behavior and state being modeled as an

active object. In addition, a servant may contain predicate methods used to determine when to execute method requests.

2. For each remote consumer in our gateway example there is a consumer handler containing a TCP connection to a consumer process running on a remote machine. Each consumer handler contains a message queue modeled as an active object and implemented with an MQ_Servant. This active object stores messages passed from suppliers to the gateway while they are waiting to be sent to the remote consumer.[2] The following C++ class illustrates the MQ_Servant class:

3. class MQ_Servant { 4. public: 5. // Constructor and destructor. 6. MQ_Servant (size_t mq_size); 7. ~MQ_Servant (); 8. 9. // Message queue implementation operations. 10. void put (const Message &msg); 11. Message get (); 12. 13. // Predicates. 14. bool empty () const; 15. bool full () const; 16. private: 17. // Internal queue representation, e.g., a

circular 18. // array or a linked list, that does not use

any 19. // internal synchronization mechanism. 20. };

21. The put() and get() methods implement the message insertion and removal operations on the queue, respectively. The servant defines two predicates, empty() and full(), that distinguish three internal states: empty, full, and neither empty nor full. These predicates are used to determine when put() and get() methods can be called on the servant.

23. In general, the synchronization mechanisms that protect a servant's critical sections from concurrent access should not be tightly coupled with the servant, which should just implement application functionality. Instead, the synchronization mechanisms

324

should be associated with the method requests. This design avoids the inheritance anomaly problem [MWY91], which inhibits the reuse of servant implementations if subclasses require different synchronization policies than base classes. Thus, a change to the synchronization constraints of the active object need not affect its servant implementation.

24. The MQ_Servant class is designed to omit synchronization mechanisms from a servant. The method implementations in the MQ_Servant class, which are omitted for brevity, therefore need not contain any synchronization mechanisms.

26. Implement the invocation infrastructure. In this activity, we describe the infrastructure necessary for clients to invoke methods on an active object. This infrastructure consists of a proxy that creates method requests, which can be implemented via two sub-activities.

1. Implement the proxy. The proxy provides clients with an interface to the servant's methods. For each method invocation by a client, the proxy creates a concrete method request. Each method request is an abstraction for the method's context, which is also called the closure of the method. Typically, this context includes the method parameters, a binding to the servant the method will be applied to, a future for the result, and the code that executes the method request.

2. In our gateway the MQ_Proxy provides the following interface to the MQ_Servant defined in implementation activity 1 (375):

3. class MQ_Proxy { 4. public: 5. // Bound the message queue size. 6. enum { MQ_MAX_SIZE = /* ... */ }; 7. MQ_Proxy (size_t size = MQ_MAX_SIZE): 8. scheduler_ (size), servant_ (size) { } 9. 10. // Schedule <put> to execute on the active

object. 11. void put (const Message &msg) { 12. Method_Request *mr = new Put (servant_,

msg); 13. scheduler_.insert (mr); 14. } 15. 16. // Return a <Message_Future> as the "future"

result of 17. // an asynchronous <get> method on the active

object. 18. Message_Future get () {

325

19. Message_Future result; 20. Method_Request *mr = new Get (servant_,

result); 21. scheduler_.insert (mr); 22. return result; 23. } 24. 25. // empty() and full() predicate

implementations ... 26. private: 27. // The servant that implements the active

object 28. // methods and a scheduler for the message

queue. 29. MQ_Servant servant_; 30. MQ_Scheduler scheduler_; 31. };

32. The MP_Proxy is a factory [GoF95] that constructs instances of method requests and passes them to a scheduler, which queues them for subsequent execution in a separate thread.

34. Multiple client threads in a process can share the same proxy. A proxy method need not be serialized because it does not change state after it is created. Its scheduler and activation list are responsible for any necessary internal serialization.

35. Our gateway example contains many supplier handlers that receive and route messages to peers via many consumer handlers. Several supplier handlers can invoke methods using the proxy that belongs to a single consumer handler without the need for any explicit synchronization.

37. Implement the method requests. Method requests can be considered as command objects [GoF95]. A method request class declares an interface used by all concrete method requests. It provides schedulers with a uniform interface that allows them to be decoupled from specific knowledge about how to evaluate synchronization constraints or trigger the execution of concrete method requests. Typically, this interface declares a can_run() method that defines a hook method guard that checks when it is possible to execute the method request. It also declares a call() method that defines a hook for executing a method request on the servant.

The methods in a method request class must be defined by subclasses. There should be one concrete method request class for each method defined in the

326

proxy. The can_run() method is often implemented with the help of the servant's predicates.

In our gateway example a Method_Request base class defines two virtual hook methods, which we call can_run() and call(): class Method_Request { public: // Evaluate the synchronization constraint. virtual bool can_run () const = 0 // Execute the method. virtual void call () = 0; };

We then define two subclasses of Method_Request: class Put corresponds to the put() method call on a proxy and class Get corresponds to the get() method call. Both classes contain a pointer to the MQ_Servant. The Get class can be implemented as follows: class Get : public Method_Request { public: Get (MQ_Servant *rep, const Message_Future &f) : servant_ (rep), result_ (f) { } virtual bool can_run () const { // Synchronization constraint: cannot call a // <get> method until queue is not empty. return !servant_->empty (); } virtual void call () { // Bind dequeued message to the future result. result_ = servant_->get (); } private: MQ_Servant *servant_; Message_Future result_; };

Note how the can_run() method uses the MQ_Servant's empty() predicate to allow a scheduler to determine when the Get method request can execute. When the method request does execute, the active object's

327

scheduler invokes its call() hook method. This call() hook uses the Get method request's run-time binding to MQ_Servant to invoke the servant's get() method, which is executed in the context of that servant. It does not require any explicit serialization mechanisms, however, because the active object's scheduler enforces all the necessary synchronization constraints via the method request can_run() methods.

The proxy passes a future to the constructors of the corresponding method request classes for each of its public two-way methods in the proxy that returns a value, such as the get() method in our gateway example. This future is returned to the client thread that calls the method, as discussed in implementation activity 5 (384).

27. Implement the activation list. Each method request is inserted into an activation list. This list can be implemented as a synchronized bounded buffer that is shared between the client threads and the thread in which the active object's scheduler and servant run. An activation list can also provide a robust iterator [Kof93] [CarEl95] that allows its scheduler to traverse and remove its elements.

The activation list is often designed using concurrency control patterns, such as Monitor Object (399), that use common synchronization mechanisms like condition variables and mutexes [Ste98]. When these are used in conjunction with a timer mechanism, a scheduler thread can determine how long to wait for certain operations to complete. For example, timed waits can be used to bound the time spent trying to remove a method request from an empty activation list or to insert into a full activation list.[3] If the timeout expires, control returns to the calling thread and the method request is not executed.

For our gateway example we specify a class Activation_List as follows: class Activation_List { public: // Block for an "infinite" amount of time waiting // for <insert> and <remove> methods to complete. enum { INFINITE = -1 }; // Define a "trait". typedef Activation_List_Iterator iterator; // Constructor creates the list with the specified // high water mark that determines its capacity. Activation_List (size_t high_water_mark); // Insert <method_request> into the list, waiting up // to <timeout> amount of time for space to become // available in the queue. Throws the <System_Ex> // exception if <timeout> expires.

328

void insert (Method_Request *method_request, Time_Value *timeout = 0); // Remove <method_request> from the list, waiting up // to <timeout> amount of time for a <method_request> // to be inserted into the list. Throws the // <System_Ex> exception if <timeout> expires. void remove (Method_Request *&method_request, Time_Value *timeout = 0); private: // Synchronization mechanisms, e.g., condition // variables and mutexes, and the queue implemen- // tation, e.g., an array or a linked list, go here. };

The insert() and remove() methods provide a 'bounded-buffer' producer/consumer [Grand98] synchronization model. This design allows a scheduler thread and multiple client threads to remove and insert Method_Requests simultaneously without corrupting the internal state of an Activation_List. Client threads play the role of producers and insert Method_Requests via a proxy. A scheduler thread plays the role of a consumer. It removes Method_Requests from the Activation_List when their guards evaluate to 'true'. It then invokes their call() hooks to execute servant methods.

28. Implement the active object's scheduler. A scheduler is a command processor [POSA1] that manages the activation list and executes pending method requests whose synchronization constraints have been met. The public interface of a scheduler often provides one method for the proxy to insert method requests into the activation list and another method that dispatches method requests to the servant.

29. We define the following MQ_Scheduler class for our gateway: 30. class MQ_Scheduler { 31. public: 32. // Initialize the <Activation_List> to have 33. // the specified capacity and make

<MQ_Scheduler> 34. // run in its own thread of control. 35. MQ_Scheduler (size_t high_water_mark); 36. 37. // ... Other constructors/destructors, etc. 38.

329

39. // Put <Method_Request> into <Activation_List>. This

40. // method runs in the thread of its client, i.e.

41. // in the proxy's thread. 42. void insert (Method_Request *mr) { 43. act_list_.insert (mr); 44. } 45. 46. // Dispatch the method requests on their

servant 47. // in its scheduler's thread of control. 48. virtual void dispatch (); 49. private: 50. // List of pending Method_Requests. 51. Activation_List act_list_; 52. 53. // Entry point into the new thread. 54. static void *svc_run (void *arg); 55. };

57. A scheduler executes its dispatch() method in a different thread of control than its client threads. Each client thread uses a proxy to insert method requests in an active object scheduler's activation list. This scheduler monitors the activation list in its own thread, selecting a method request whose guard evaluates to 'true,' that is, whose synchronization constraints are met. This method request is then removed from the activation list and executed by invoking its call() hook method.

58. In our gateway example the constructor of MQ_Scheduler initializes the Activation_List and uses the Thread_Manager wrapper facade (47) to spawn a new thread of control:

59. MQ_Scheduler::MQ_Scheduler (size_t high_water_mark):

60. act_queue_ (high_water_mark) { 61. // Spawn separate thread to dispatch method

requests. 62. Thread_Manager::instance ()->spawn (&svc_run,

this); 63. }

64. The Thread_Manager::spawn() method is passed a pointer to a static MQ_Scheduler::svc_run() method and a pointer to the MQ_Scheduler object. The svc_run() static method is the entry point into a newly created

330

thread of control, which runs the svc_run() method. This method is simply an adapter [GoF95] that calls the MQ_Scheduler::dispatch() method on the this parameter:

65. void *MQ_Scheduler::svc_run (void *args) { 66. MQ_Scheduler *this_obj = 67. static_cast<MQ_Scheduler *> (args); 68. 69. this_obj->dispatch (); 70. }

71. The dispatch() method determines the order in which Put and Get method requests are processed based on the underlying MQ_Servant predicates empty() and full(). These predicates reflect the state of the servant, such as whether the message queue is empty, full, or neither.

72. By evaluating these predicate constraints via the method request can_run() methods, a scheduler can ensure fair access to the MQ_Servant:

73. virtual void MQ_Scheduler::dispatch () { 74. // Iterate continuously in a separate thread. 75. for (;;) { 76. Activation_List::iterator request; 77. // The iterator's <begin> method blocks 78. // when the <Activation_List> is empty. 79. for (request = act_list_.begin (); 80. request != act_list_.end (); 81. ++request) { 82. // Select a method request whose 83. // guard evaluates to true. 84. if ((*request).can_run ()) { 85. // Take <request> off the list. 86. act_list_.remove (*request); 87. (*request).call () ; 88. delete *request; 89. } 90. // Other scheduling activities can go

here, 91. // e.g., to handle when no

<Method_Request>s 92. // in the <Activation_List> have

<can_run> 93. // methods that evaluate to true. 94. } 95. }

331

96. }

98. In our example the MQ_Scheduler::dispatch() implementation iterates continuously, executing the next method request whose can_run() method evaluates to true. Scheduler implementations can be more sophisticated, however, and may contain variables that represent the servant's synchronization state.

99. For example, to implement a multiple-readers/single-writer synchronization policy a prospective writer will call 'write' on the proxy, passing the data to write. Similarly, readers will call 'read' and obtain a future as their return value. The active object's scheduler maintains several counter variables that keep track of the synchronization state, such as the number of read and write requests. The scheduler also maintains knowledge about the identity of the prospective writers.

100. The active object's scheduler can use these synchronization state counters to determine when a single writer can proceed, that is, when the current number of readers is zero and no write request from a different writer is currently pending execution. When such a write request arrives, a scheduler may choose to dispatch the writer to ensure fairness. In contrast, when read requests arrive and the servant can satisfy them because it is not empty, its scheduler can block all writing activity and dispatch read requests first.

101. The synchronization state counter variable values described above are independent of the servant's state because they are only used by its scheduler to enforce the correct synchronization policy on behalf of the servant. The servant focuses solely on its task to temporarily store client-specific application data. In contrast, its scheduler focuses on coordinating multiple readers and writers. This design enhances modularity and reusability.

102. A scheduler can support multiple synchronization policies by using the Strategy pattern [GoF95]. Each synchronization policy is encapsulated in a separate strategy class. The scheduler, which plays the context role in the Strategy pattern, is then configured with a particular synchronization strategy it uses to execute all subsequent scheduling decisions.

103. Determine rendezvous and return value policy. The rendezvous policy determines how clients obtain return values from methods invoked on active objects. A rendezvous occurs when an active object servant executing in one thread passes a return value to the client that invoke the method running in another thread. Implementations of the Active Object pattern often choose from the following rendezvous and return value policies: § Synchronous waiting. Block the client thread synchronously in the proxy until

the scheduler dispatches the method request and the result is computed and stored in the future.

§ Synchronous timed wait. Block for a bounded amount of time and fail if the active object's scheduler does not dispatch the method request within that time period. If the timeout is zero the client thread 'polls', that is, it returns to the caller without queueing the method request if its scheduler cannot dispatch it immediately.

§ Asynchronous. Queue the method call and return control to the client immediately. If the method is a two-way invocation that produces a result then some form of future must be used to provide synchronized access to the value, or to the error status if the method call fails.

332

The future construct allows two-way asynchronous invocations [ARSK00] that return a value to the client. When a servant completes the method execution, it acquires a write lock on the future and updates the future with its result. Any client threads that are blocked waiting for the result are awakened and can access the result concurrently. A future can be garbage-collected after the writer and all readers threads no longer reference it. In languages like C++, which do not support garbage collection, futures can be reclaimed when they are no longer in use via idioms like Counted Pointer [POSA1].

In our gateway example the get() method invoked on the MQ_Proxy ultimately results in the Get::call() method being dispatched by the MQ_Scheduler, as shown in implementation activity 2 (378). The MQ_Proxy::get() method returns a value, therefore a Message_Future is returned to the client that calls it: class Message_Future { public: // Binds <this> and <f> to the same <Msg._Future_Imp.> Message_Future (const Message_Future &f); // Initializes <Message_Future_Implementation> to // point to <message> m immediately. Message_Future (const Message &message); // Creates a <Msg._Future_Imp.> Message_Future (); // Binds <this> and <f> to the same // <Msg._Future_Imp.>, which is created if necessary. void operator= (const Message_Future &f); // Block upto <timeout>time waiting to obtain result // of an asynchronous method invocation. Throws // <System_Ex> exception if <timeout> expires. Message result (Time_Value *timeout = 0) const; private: // <Message_Future_Implementation> uses the Counted // Pointer idiom. Message_Future_Implementation *future_impl_; };

The Message_Future is implemented using the Counted Pointer idiom [POSA1]. This idiom simplifies memory management for dynamically allocated

333

C++ objects by using a reference counted Message_Future_Implementation body that is accessed solely through the Message_Future handle.

In general a client may choose to evaluate the result value from a future immediately, in which case the client blocks until the scheduler executes the method request. Conversely, the evaluation of a return result from a method invocation on an active object can be deferred. In this case the client thread and the thread executing the method can both proceed asynchronously.

In our gateway example a consumer handler running in a separate thread may choose to block until new messages arrive from suppliers: MQ_Proxy message_queue; // Obtain future and block thread until message arrives. Message_Future future = message_queue.get (); Message msg = future.result (); // Transmit message to the consumer. send (msg);

Conversely, if messages are not available immediately, a consumer handler can store the Message_Future return value from message_queue and perform other 'book-keeping' tasks, such as exchanging keep-alive messages to ensure its consumer is still active. When the consumer handler is finished with these tasks, it can block until a message arrives from suppliers: // Obtain a future (does not block the client). Message_Future future = message_queue.get (); // Do something else here... // Evaluate future and block if result is not available. Message msg = future.result (); send (msg);

Example Resolved

In our gateway example, the gateway's supplier and consumer handlers are local proxies [POSA1] [GoF95] for remote suppliers and consumers, respectively. Supplier handlers receive messages from remote suppliers and inspect address fields in the messages. The address is used as a key into a routing table that identifies which remote consumer will receive the message.

334

The routing table maintains a map of consumer handlers, each of which is responsible for delivering messages to its remote consumer over a separate TCP connection. To handle flow control over various TCP connections, each consumer handler contains a message queue implemented using the Active Object pattern. This design decouples supplier and consumer handlers so that they can run concurrently and block independently.

The Consumer_Handler class is defined as follows: class Consumer_Handler { public: // Constructor spawns the active object's thread. Consumer_Handler (); // Put the message into the queue. void put (const Message &msg) { msg_q_.put (msg); } private: MQ_Proxy msg_q_; // Proxy to the Active Object. SOCK_Stream connection_; // Connection to consumer. // Entry point into the new thread. static void *svc_run (void *arg); };

Supplier_Handlers running in their own threads can put messages in the appropriate Consumer_Handler's message queue active object: void Supplier_Handler::route_message (const Message &msg) { // Locate the appropriate consumer based on the // address information in <Message>. Consumer_Handler *consumer_handler = routing_table_.find (msg.address ());

335

// Put the Message into the Consumer Handler's queue. consumer_handler->put (msg); }

To process the messages inserted into its queue, each Consumer_Handler uses the Thread_Manager wrapper facade (47) to spawn a separate thread of control in its constructor: Consumer_Handler::Consumer_Handler () { // Spawn a separate thread to get messages from the // message queue and send them to the consumer. Thread_Manager::instance ()->spawn (&svc_run, this); // ... }

This new thread executes the svc_run() method entry point, which gets the messages placed into the queue by supplier handler threads, and sends them to the consumer over the TCP connection: void *Consumer_Handler::svc_run (void *args) { Consumer_Handler *this_obj = static_cast<Consumer_Handler *> (args); for (;;) { // Block thread until a <Message> is available. Message msg = this_obj->msg_q_.get ().result (); // Transmit <Message> to the consumer over the // TCP connection. this_obj->connection_.send (msg, msg.length ()); } }

Every Consumer_Handler object uses the message queue that is implemented as an active object and runs in its own thread. Therefore its send() operation can block without affecting the quality of service of other Consumer_Handler objects.

Variants

Multiple Roles. If an active object implements multiple roles, each used by particular types of client, a separate proxy can be introduced for each role. By using the Extension Interface pattern (141), clients can obtain the proxies they need. This design helps separate concerns because a client only sees the particular methods of an active object it needs for its own operation, which further simplifies an active object's evolution. For example, new services can be added to the active object by providing new extension interface proxies without changing existing ones. Clients that do not need access to the new services are unaffected by the extension and need not even be recompiled.

Integrated Scheduler. To reduce the number of components needed to implement the Active Object pattern, the roles of the proxy and servant can be integrated into its scheduler component. Likewise, the transformation of a method call on a proxy into a method request

336

can also be integrated into the scheduler. However, servants still execute in a different thread than proxies.

Here is an implementation of the message queue using an integrated scheduler: class MQ_Scheduler { public: MQ_Scheduler (size_t size) : servant_ (size), act_list_ (size) { } // ... other constructors/destructors, etc. void put (const Message m) { Method_Request *mr = new Put (&servant_, m); act_list_.insert (mr); } Message_Future get () { Message_Future result; Method_Request *mr = new Get (&servant_, result); act_list_.insert (mr); return result; } // Other methods ... private: MQ_Servant servant_; Activation_List act_list_; // ... };

By centralizing the point at which method requests are generated, the Active Object pattern implementation can be simplified because it has fewer components. The drawback, of course, is that a scheduler must know the type of the servant and proxy, which makes it hard to reuse the same scheduler for different types of active objects.

Message Passing. A further refinement of the integrated scheduler variant is to remove the proxy and servant altogether and use direct message passing between the client thread and the active object's scheduler thread.

For example, consider the following scheduler implementation: class Scheduler {

337

public: Scheduler (size_t size): act_list_ (size) { } // ... other constructors/destructors, etc. void insert (Message_Request *message_request) { act_list_.insert (message_request); } virtual void dispatch () { for (;;) { Message_Request *mr; // Block waiting for next request to arrive. act_list_.remove (mr); // Process the message request <mr>... } } // ... private: Activation_List act_list_; // ... };

In this variant, there is no proxy, so clients create an appropriate type of message request directly and call insert() themselves, which enqueues the request into the activation list. Likewise, there is no servant, so the dispatch() method running in a scheduler's thread simply dequeues the next message request and processes the request according to its type.

In general it is easier to develop a message-passing mechanism than it is to develop an active object because there are fewer components. Message passing can be more tedious and error-prone, however, because application developers, not active object developers, must program the proxy and servant logic. As a result, message passing implementations are less type-safe than active object implementations because their interfaces are implicit rather than explicit. In addition, it is harder for application developers to distribute clients and servers via message passing because there is no proxy to encapsulate the marshaling and demarshaling of data.

Polymorphic Futures [LK95]. A polymorphic future allows parameterization of the eventual result type represented by the future and enforces the necessary synchronization. In particular, a polymorphic future describes a typed future that client threads can use to retrieve a method request's result. Whether a client blocks on a future depends on whether or not a result has been computed.

338

The following class is a polymorphic future template for C++: template <class TYPE> class Future { // This class can be used to return results from // two-way asynchronous method invocations. public: // Constructor and copy constructor that binds <this> // and <r> to the same <Future> representation. Future (); Future (const Future<TYPE> &r); // Destructor. ~Future (); // Assignment operator that binds <this> and <r> to // the same <Future> representation. void operator = (const Future<TYPE> &r); // Cancel a <Future> and reinitialize it. void cancel () ; // Block upto <timeout> time waiting to obtain result // of an asynchronous method invocation. Throws // <System_Ex> exception if <timeout> expires. TYPE result (Time_Value *timeout = 0) const; private: // ... };

A client can use a polymorphic future as follows: try { // Obtain a future (does not block the client). Future<Message> future = message_queue.get (); // Do something else here... // Evaluate future and block for up to 1 second // waiting for the result to become available. Time_Value timeout (1); Message msg = future.result (&timeout); // Do something with the result ...

339

} catch (System_Ex &ex) { if (ex.status () == ETIMEDOUT) /* handle timeout */ }

Timed method invocations. The activation list illustrated in implementation activity 3 (379) defines a mechanism that can bound the amount of time a scheduler waits to insert or remove a method request. Although the examples we showed earlier in the pattern do not use this feature, many applications can benefit from timed method invocations. To implement this feature we can simply export the timeout mechanism via schedulers and proxies.

In our gateway example, the MQ_Proxy can be modified so that its methods allow clients to bound the amount of time they are willing to wait to execute: class MQ_Proxy { public: // Schedule <put> to execute, but do not block longer // than <timeout> time. Throws <System_Ex> // exception if <timeout> expires. void put (const Message &msg, Time_Value *timeout = 0); // Return a <Message_Future> as the "future" result of // an asynchronous <get> method on the active object, // but do not block longer than <timeout> amount of // time. Throws the <System_Ex> exception if // <timeout> expires. Message_Future get (Time_Value *timeout = 0); };

If timeout is 0 both get() and put() will block indefinitely until Message is either removed from or inserted into the scheduler's activation list, respectively. If timeout expires, the System_Ex exception defined in the Wrapper Facade pattern (47) is thrown with a status() value of ETIMEDOUT and the client must catch it.

To complete our support for timed method invocations, we also must add timeout support to the MQ_Scheduler: class MQ_Scheduler { public: // Insert a method request into the <Activation_List> // This method runs in the thread of its client, i.e. // in the proxy's thread, but does not block longer // than <timeout> amount of time. Throws the

340

// <System_Ex> exception if the <timeout> expires. void insert (Method_Request *method_request, Time_Value *timeout) { act_list_.insert (method_request, timeout); } }

Distributed Active Object. In this variant a distribution boundary exists between a proxy and a scheduler, rather than just a threading boundary. This pattern variant introduces two new participants: § A client-side proxy plays the role of a stub, which marshals method parameters into a

method request that is sent across a network and executed by a servant in a separate server address space.

§ A server-side skeleton, which demarshals method request parameters before they are passed to a server's servant method.

The Distributed Active Object pattern variant is therefore similar to the Broker pattern [POSA1]. The primary difference is that a Broker usually coordinates the processing of many objects, whereas a distributed active object just handles a single object.

Thread Pool Active Object. This generalization of the Active Object pattern supports multiple servant threads per active object to increase throughput and responsiveness. When not processing requests, each servant thread in a thread pool active object blocks on a single activation list. The active object scheduler assigns a new method request to an available servant thread in the pool as soon as one is ready to be executed.

A single servant implementation is shared by all the servant threads in the pool. This design cannot therefore be used if the servant methods do not protect their internal state via some type of synchronization mechanism, such as a mutex.

Additional variants of active objects can be found in [Lea99a], Chapter 5: Concurrency Control and Chapter 6: Services in Threads.

Known Uses

ACE Framework [Sch97]. Reusable implementations of the method request, activation list, and future components in the Active Object pattern are provided in the ACE framework. The corresponding classes in ACE are called ACE_Method_Request, ACE_Activation_Queue, and ACE_Future. These components have been used to implement many production concurrent and networked systems [Sch96].

Siemens MedCom. The Active Object pattern is used in the Siemens MedCom framework, which provides a black-box component-based framework for electronic medical imaging systems. MedCom employs the Active Object pattern in conjunction with the Command Processor pattern [POSA1] to simplify client windowing applications that access patient information on various medical servers [JWS98].

Siemens FlexRouting - Automatic Call Distribution [Flex98]. This call center management system uses the Thread Pool variant of the Active Object pattern. Services that a call center offers are implemented as applications of their own. For example, there may be a hot-line application, an ordering application, and a product information application,

341

depending on the types of service offered. These applications support operator personnel that serve various customer requests. Each instance of these applications is a separate servant component. A 'FlexRouter' component, which corresponds to the scheduler, dispatches incoming customer requests automatically to operator applications that can service these requests.

Java JDK 1.3 introduced a mechanism for executing timer-based tasks concurrently in the classes java.util.Timer and java.util.TimerTask. Whenever the scheduled execution time of a task occurs it is executed. Specifically, Timer offers different scheduling functions to clients that allow them to specify when and how often a task should be executed. One-shot tasks are straightforward and recurring tasks can be scheduled at periodic intervals. The scheduling calls are executed in the client's thread, while the tasks themselves are executed in a thread owned by the Timer object. A Timer internal task queue is protected by locks because the two threads outlined above operate on it concurrently.

The task queue is implemented as a priority queue so that the next TimerTask to expire can be identified efficiently. The timer thread simply waits until this expiration. There are no explicit guard methods and predicates because determining when a task is 'ready for execution' simply depends on the arrival of the scheduled time.

Tasks are implemented as subclasses of TimerTask that override its run() hook method. The TimerTask subclasses unify the concepts behind method requests and servants by offering just one class and one interface method via TimerTask.run().

The scheme described above simplifies the Active Object machinery for the purpose of timed execution. There is no proxy and clients call the scheduler—the Timer object—directly. Clients do not invoke an ordinary method and therefore the concurrency is not transparent. Moreover, there are no return value or future objects linked to the run() method. An application can employ several active objects by constructing several Timer objects, each with its own thread and task queue.

Chef in a restaurant. A real-life example of the Active Object pattern is found in restaurants. Waiters and waitresses drop off customer food requests with the chef and continue to service requests from other customers asynchronously while the food is being prepared. The chef keeps track of the customer food requests via some type of worklist. However, the chef may cook the food requests in a different order than they arrived to use available resources, such as stove tops, pots, or pans, most efficiently. When the food is cooked, the chef places the results on top of a counter along with the original request so the waiters and waitresses can rendezvous to pick up the food and serve their customers.

Consequences

The Active Object pattern provides the following benefits:

Enhances application concurrency and simplifies synchronization complexity. Concurrency is enhanced by allowing client threads and asynchronous method executions to run simultaneously. Synchronization complexity is simplified by using a scheduler that evaluates synchronization constraints to guarantee serialized access to servants, in accordance with their state.

Transparently leverages available parallelism. If the hardware and software platforms support multiple CPUs efficiently, this pattern can allow multiple active objects to execute in parallel, subject only to their synchronization constraints.

342

Method execution order can differ from method invocation order. Methods invoked asynchronously are executed according to the synchronization constraints defined by their guards and by scheduling policies. Thus, the order of method execution can differ from the order of method invocation order. This decoupling can help improve application performance and flexibility.

However, the Active Object pattern encounters several liabilities:

Performance overhead. Depending on how an active object's scheduler is implemented—for example in user-space versus kernel-space [SchSu95]—context switching, synchronization, and data movement overhead may occur when scheduling and executing active object method invocations. In general the Active Object pattern is most applicable for relatively coarse-grained objects. In contrast, if the objects are fine-grained, the performance overhead of active objects can be excessive, compared with related concurrency patterns, such as Monitor Object (399).

Complicated debugging. It is hard to debug programs that use the Active Object pattern due to the concurrency and non-determinism of the various active object schedulers and the underlying operating system thread scheduler. In particular, method request guards determine the order of execution. However, the behavior of these guards may be hard to understand and debug. Improperly defined guards can cause starvation, which is a condition where certain method requests never execute. In addition, program debuggers may not support multi-threaded applications adequately.

See Also

The Monitor Object pattern (399) ensures that only one method at a time executes within a thread-safe passive object, regardless of the number of threads that invoke the object's methods concurrently. In general, monitor objects are more efficient than active objects because they incur less context switching and data movement overhead. However, it is harder to add a distribution boundary between client and server threads using the Monitor Object pattern.

It is instructive to compare the Active Object pattern solution in the Example Resolved section with the solution presented in the Monitor Object pattern. Both solutions have similar overall application architectures. In particular, the Supplier_Handler and Consumer_Handler implementations are almost identical.

The primary difference is that the Message_Queue in the Active Object pattern supports sophisticated method request queueing and scheduling strategies. Similarly, because active objects execute in different threads than their clients, there are situations where active objects can improve overall application concurrency by executing multiple operations asynchronously. When these operations complete, clients can obtain their results via futures [Ha185] [LS88].

On the other hand, the Message_Queue itself is easier to program and often more efficient when implemented using the Monitor Object pattern than the Active Object pattern.

The Reactor pattern (179) is responsible for demultiplexing and dispatching multiple event handlers that are triggered when it is possible to initiate an operation without blocking. This pattern is often used in lieu of the Active Object pattern to schedule callback operations to passive objects. Active Object also can be used in conjunction with the Reactor pattern to form the Half-Sync/Half-Async pattern (423).

The Half-Sync/Half-Async pattern (423) decouples synchronous I/O from asynchronous I/O in a system to simplify concurrent programming effort without degrading execution efficiency.

343

Variants of this pattern use the Active Object pattern to implement its synchronous task layer, the Reactor pattern (179) to implement the asynchronous task layer, and a Producer-Consumer pattern [Lea99a], such as a variant of the Pipes and Filters pattern [POSA1] or the Monitor Object pattern (399), to implement the queueing layer.

The Command Processor pattern [POSA1] separates issuing requests from their execution. A command processor, which corresponds to the Active Object pattern's scheduler, maintains pending service requests that are implemented as commands [GoF95], Commands are executed on suppliers, which correspond to servants. The Command Processor pattern does not focus on concurrency, however. In fact, clients, the command processor, and suppliers often reside in the same thread of control. Likewise, there are no proxies that represent the servants to clients. Clients create commands and pass them directly to the command processor.

The Broker pattern [POSA1] defines many of the same components as the Active Object pattern. In particular, clients access brokers via proxies and servers implement remote objects via servants. One difference between Broker and Active Object is that there is a distribution boundary between proxies and servants in the Broker pattern, as opposed to a threading boundary between proxies and servants in the Active Object pattern. Another difference is that active objects typically have just one servant, whereas a broker can have many servants.

Credits

The genesis for documenting Active Object as a pattern originated with Greg Lavender [PLoPD2]. Ward Cunningham helped shape this version of the Active Object pattern. Bob Laferriere and Rainer Blome provided useful suggestions that improved the clarity of the pattern's Implementation section. Thanks to Doug Lea for providing many additional insights in [Lea99a].

[1]See the Acceptor-Connector pattern (285) for further details of this example.

[2]The active object message queue in this example is an implementation mechanism that buffers messages to avoid blocking the gateway when flow control occurs on TCP connections. It is not related to the activation list, which is an Active Object pattern participant that stores method requests pending execution. See the Example Resolved section and the Monitor Object pattern (399) for further discussion of the example.

[3]A list is considered 'full' when its current method request count equals its high-water mark.

Monitor Object The Monitor Object design pattern synchronizes concurrent method execution to ensure that only one method at a time runs within an object. It also allows an object's methods to cooperatively schedule their execution sequences.

Also Known As

Thread-safe Passive Object

Example

Let us reconsider the design of the communication gateway described in the Active Object pattern (369).[4]

344

The gateway process is a mediator [GoF95] that contains multiple supplier and consumer handler objects. These objects run in separate threads and route messages from one or more remote suppliers to one or more remote consumers. When a supplier handler thread receives a message from a remote supplier, it uses an address field in the message to determine the corresponding consumer handler. The handler's thread then delivers the message to its remote consumer.

When suppliers and consumers reside on separate hosts, the gateway uses a connection-oriented protocol, such as TCP [Ste93], to provide reliable message delivery and end-to-end flow control. Flow control is a protocol mechanism that blocks senders when they produce messages more rapidly than receivers can process them. The entire gateway should not block while waiting for flow control to abate on outgoing TCP connections, however. In particular, incoming TCP connections should continue to be processed and messages should continue to be sent over any non-flow-controlled TCP connections.

To minimize blocking, each consumer handler can contain a thread-safe message queue. Each queue buffers new routing messages it receives from its supplier handler threads. This design decouples supplier handler threads in the gateway process from consumer handler threads, so that all threads can run concurrently and block independently when flow control occurs on various TCP connections.

One way to implement a thread-safe message queue is to apply the Active Object pattern (369) to decouple the thread used to invoke a method from the thread used to execute the method. Active Object may be inappropriate, however, if the entire infrastructure introduced by this pattern is unnecessary. For example, a message queue's enqueue and dequeue methods may not require sophisticated scheduling strategies. In this case, Implementing the Active Object pattern's method request, scheduler and activation list participants incurs unnecessary performance overhead, and programming effort.

Instead, the implementation of the thread-safe message queue must be efficient to avoid degrading performance unnecessarily. To avoid tight coupling of supplier and consumer handler implementations, the mechanism should also be transparent to implementors of supplier handlers. Varying either implementation independently would otherwise become prohibitively complex.

Context

Multiple threads of control accessing the same object concurrently.

Problem

Many applications contain objects whose methods are invoked concurrently by multiple client threads. These methods often modify the state of their objects. For such concurrent

345

applications to execute correctly, therefore, it is necessary to synchronize and schedule access to the objects.

In the presence of this problem four forces must be addressed: § To separate concerns and protect object state from uncontrolled changes, object-

oriented programmers are accustomed to accessing objects only through their interface methods. It is relatively straightforward to extend this object-oriented programming model to protect an object's data from uncontrolled concurrent changes, known as race conditions. An object's interface methods should therefore define its synchronization boundaries, and only one method at a time should be active within the same object.

§ Concurrent applications are harder to program if clients must explicitly acquire and release low-level synchronization mechanisms, such as semaphores, mutexes, or condition variables [IEEE96]. Objects should therefore be responsible for ensuring that any of their methods that require synchronization are serialized transparently, without requiring explicit client intervention.

§ If an object's methods must block during their execution, they should be able to relinquish their thread of control voluntarily, so that methods called from other client threads can access the object. This property helps prevent deadlock and makes it possible to take advantage of concurrency mechanisms available on hardware and software platforms.

§ When a method relinquishes its thread of control voluntarily, it must leave its object in a stable state, that is, object-specific invariants must hold. Similarly, a method must resume its execution within an object only when the object is in a stable state.

Solution

Synchronize the access to an object's methods so that only one method can execute at any one time.

In detail: for each object accessed concurrently by multiple client threads, define it as a monitor object. Clients can access the functions defined by a monitor object only through its synchronized methods. To prevent race conditions on its internal state, only one synchronized method at a time can run within a monitor object. To serialize concurrent access to an object's state, each monitor object contains a monitor lock. Synchronized methods can determine the circumstances under which they suspend and resume their execution, based on one or more monitor conditions associated with a monitor object.

Structure

There are four participants in the Monitor Object pattern:

A monitor object exports one or more methods. To protect the internal state of the monitor object from uncontrolled changes and race conditions, all clients must access the monitor object only through these methods. Each method executes in the thread of the client that invokes it, because a monitor object does not have its own thread of control.[5]

Synchronized methods implement the thread-safe functions exported by a monitor object. To prevent race conditions, only one synchronized method can execute within a monitor object at any one time. This rule applies regardless of the number of threads that invoke the object's synchronized methods concurrently, or the number of synchronized methods in the object's class.

A consumer handler's message queue in the gateway application can be implemented as a monitor object by converting its put() and get() operations into synchronized

346

methods. This design ensures that routing messages can be inserted and removed concurrently by multiple threads without corrupting the queue's internal state.

Each monitor object contains its own monitor lock. Synchronized methods use this lock to serialize method invocations on a per-object basis. Each synchronized method must acquire and release an object's monitor lock when entering or exiting the object. This protocol ensures the monitor lock is held whenever a synchronized method performs operations that access or modify the state of its object.

Monitor condition. Multiple synchronized methods running in separate threads can schedule their execution sequences cooperatively by waiting for and notifying each other via monitor conditions associated with their monitor object. Synchronized methods use their monitor lock in conjunction with their monitor condition(s) to determine the circumstances under which they should suspend or resume their processing.

In the gateway application a POSIX mutex [IEEE96] can be used to implement the message queue's monitor lock. A pair of POSIX condition variables can be used to implement the message queue's not-empty and not-full monitor conditions: § When a consumer handler thread attempts to dequeue a routing message from an

empty message queue, the queue's get() method must atomically release the monitor lock and suspend itself on the not-empty monitor condition. It remains suspended until the queue is no longer empty, which happens when a supplier handler thread inserts a message into the queue.

§ When a supplier handler thread attempts to enqueue a message into a full queue, the queue's put() method must atomically release the monitor lock and suspend itself on the not-full monitor condition. It remains suspended until the queue is no longer full, which happens when a consumer handler removes a message from the message queue.

Note that the not-empty and not-full monitor conditions both share the same monitor lock.

347

The structure of the Monitor Object pattern is illustrated in the following class diagram:

Dynamics

The collaborations between participants in the Monitor Object pattern divide into four phases: § Synchronized method invocation and serialization. When client thread T1 invokes a

synchronized method on a monitor object, the method must first acquire the object's monitor lock. A monitor lock cannot be acquired as long as another synchronized method in thread T2 is executing within the monitor object. In this case, client thread T1 will block until the synchronized method acquires the lock. Once the synchronized method called by T1 has finished executing, the monitor lock is released so that other synchronized methods called by other threads can access the monitor object.

§ Synchronized method thread suspension. If a synchronized method must block or cannot otherwise make immediate progress, it can wait on one of its monitor conditions. This causes it to 'leave' the monitor object temporarily [Hoare74]. The monitor object implementation is responsible for ensuring that it is in a stable state before switching to another thread. When a synchronized method leaves the monitor object, the client's thread is suspended on that monitor condition and the monitor lock is released atomically by the operating system's thread scheduler. Another synchronized method in another thread can now execute within the monitor object.

§ Monitor condition notification. A synchronized method can notify a monitor condition. This operation awakens the thread of a synchronized method that had suspended itself on the monitor condition earlier. A synchronized method can also notify all other synchronized methods that suspended their threads earlier on a monitor condition. In this case all the threads are awakened and one of them at a time can acquire the monitor lock and run within the monitor object.

§ Synchronized method thread resumption. Once a suspended synchronized method thread is notified, its execution can resume at the point where it waited on the monitor condition. The operating system thread scheduler performs this resumption implicitly. The monitor lock is reacquired atomically before the notified thread 're-enters' the monitor object and resumes its execution in the synchronized method.

348

Implementation

Four activities illustrate how to implement the Monitor Object pattern. 1. Define the monitor object's interface methods. The interface of a monitor object exports

a set of methods to clients. Interface methods are often synchronized, that is, only one of them at a time can be executed by a thread within a particular monitor object.

2. In our gateway example, each consumer handler contains a message queue and a TCP connection. The message queue can be implemented as a monitor object that buffers messages it receives from supplier handler threads. This buffering helps prevent the entire gateway process from blocking whenever consumer handler threads encounter flow control on TCP connections to their remote consumers. The following C++ class defines the interface for our message queue monitor object:

3. class Message_Queue { 4. public: 5. enum { MAX_MESSAGES = /* ... */; }; 6. 7. // The constructor defines the maximum number 8. // of messages in the queue. This determines 9. // when the queue is 'full.' 10. Message_Queue (size_t max_messages =

MAX_MESSAGES);

349

11. 12. // Put the <Message> at the tail of the queue. 13. // If the queue is full, block until the queue 14. // is not full. 15. /* synchronized */ void put (const Message

&msg); 16. 17. // Get the <Message> from the head of the

queue 18. // and remove it. If the queue is empty, 19. // block until the queue is not empty. 20. /* synchronized */ Message get (); 21. 22. // True if the queue is empty, else false. 23. /* synchronized */ bool empty () const; 24. 25. // True if the queue is full, else false. 26. /* synchronized */ bool full () const; 27. private: 28. // ... described later ... 29. };

30. The Message_Queue monitor object interface exports four synchronized methods. The empty() and full() methods are predicates that clients can use to distinguish three internal queue states: empty, full, and neither empty nor full. The put() and get() methods enqueue and dequeue messages into and from the queue, respectively, and will block if the queue is full or empty.

32. Define the monitor object's implementation methods. A monitor object often contains internal implementation methods that synchronized interface methods use to perform the object's functionality. This design helps decouple the core monitor object functionality from its synchronization and scheduling logic. It also helps avoid intra-object deadlock and unnecessary locking overhead.

Two conventions, based on the Thread-Safe Interface pattern (345), can be used to structure the separation of concerns between interface and implementation methods in a monitor object: § Interface methods only acquire and release monitor locks and wait upon or

notify certain monitor conditions. They otherwise forward control to implementation methods that perform the monitor object's functionality.

§ Implementation methods only perform work when called by interface methods. They do not acquire and release the monitor lock, nor do they wait upon or notify monitor conditions explicitly.

Similarly, in accordance with the Thread-Safe Interface pattern, implementation methods should not call any synchronized methods defined in the class interface.

350

This restriction helps to avoid intra-object method deadlock or unnecessary synchronization overhead.

In our gateway, the Message_Queue class defines four implementation methods: put_i(), get_i(), empty_i(), and full_ i(): class Message_Queue { public: // ... See above ... private: // Put the <Message> at the tail of the queue, and // get the <Message> at its head, respectively. void put_i (const Message &msg); Message get_i (); // True if the queue is empty, else false. bool empty_i () const; // True if the queue is full, else false. bool full_i () const; };

Implementation methods are often non-synchronized. They must be careful when invoking blocking calls, because the interface method that called the implementation method may have acquired the monitor lock. A blocking thread that owned a lock could therefore delay overall program progress indefinitely.

33. Define the monitor object's internal state and synchronization mechanisms. A monitor object contains data members that define its internal state. This state must be protected from corruption by race conditions resulting from unsynchronized concurrent access. A monitor object therefore contains a monitor lock that serializes the execution of its synchronized methods, as well as one or more monitor conditions used to schedule the execution of synchronized method within a monitor object. Typically there is a separate monitor condition for each of the following situations: § Cases in which synchronized methods must suspend their processing to wait

for the occurrence of some event of state change; or § Cases in which synchronized methods must resume other threads whose

synchronized methods have suspended themselves on the monitor condition.

A monitor object method implementation is responsible for ensuring that it is in a stable state before releasing its lock. Stable states can be described by invariants, such as the need for all elements in a message queue to be linked together via valid pointers. The invariant must hold whenever a monitor object method waits on the corresponding condition variable.

Similarly, when the monitor object is notified and the operating system thread scheduler decides to resume its thread, the monitor object method implementation is responsible for ensuring that the invariant is indeed satisfied before proceeding. This

351

check is necessary because other threads may have changed the state of the object between the notification and the resumption. A a result, the monitor object must ensure that the invariant is satisfied before allowing a synchronized method to resume its execution.

A monitor lock can be implemented using a mutex. A mutex makes collaborating threads wait while the thread holding the mutex executes code in a critical section. Monitor conditions can be implemented using condition variables [IEEE96]. A condition variable can be used by a thread to make itself wait until a particular event occurs or an arbitrarily complex condition expression attains a particular stable state. Condition expressions typically access objects or state variables shared between threads. They can be used to implement the Guarded Suspension pattern [Lea99a].

In our gateway example, the Message_Queue defines its internal state, as illustrated below: class Message_Queue { // ... See above .... private: // ... See above ... // Internal Queue representation omitted, could be a // circular array or a linked list, etc.. ... // Current number of <Message>s in the queue. size_t message_count_; // The maximum number <Message>s that can be // in a queue before it's considered 'full.' size_t max_messages_; // Mutex wrapper facade that protects the queue's // internal state from race conditions during // concurrent access. mutable Thread_Mutex monitor_lock_; // Condition variable wrapper facade used in // conjunction with <monitor_lock_> to make // synchronized method threads wait until the queue // is no longer empty. Thread_Condition not_empty_; // Condition variable wrapper facade used in // conjunction with <monitor_lock_> to make

352

// synchronized method threads wait until the queue // is no longer full. Thread_Condition not_full_; };

A Message_Queue monitor object defines three types of internal state: § Queue representation data members. These data members define the internal

queue representation. This representation stores the contents of the queue in a circular array or linked list, together with book-keeping information needed to determine whether the queue is empty, full, or neither. The internal queue representation is manipulated only by the put_i(), get_i(), empty_i(), and full_i() implementation methods.

§ Monitor lock data member. The monitor_lock_ is used by a Message_Queue's synchronized methods to serialize their access to the state of the queue's internal representation. A monitor object's lock must be held whenever its state is being changed to ensure that its invariants are satisfied. This monitor lock is implemented using the platform-independent Thread_Mutex class defined in the Wrapper Facade pattern (47).

§ Monitor condition data members. The monitor conditions is_full_ and is_empty_ are used by the put() and get() synchronized methods to suspend and resume themselves when a Message_Queue leaves its full and empty boundary conditions, respectively. These monitor conditions are implemented using the platform-independent Thread_Condition class defined in the Wrapper Facade pattern (47).

34. Implement all the monitor object's methods and data members. The following two sub-activities can be used to implement all the monitor object methods and internal state defined above.

1. Initialize the data members. This sub-activity initializes object-specific data members, as well as the monitor lock and any monitor conditions.

2. The constructor of Message_Queue creates an empty queue and initializes the monitor conditions not_empty_ and not_full_:

3. Message_Queue::Message_Queue (size_t max_messages) 4. : not_full_ (monitor_lock_), 5. not_empty_ (monitor_lock_), 6. max_messages_ (max_messages), 7. message_count_ (0) { /* ... */ }

8. In this example, both monitor conditions share the same monitor_lock_. This design ensures that Message_Queue state, such as the message_count_, is serialized properly to prevent race conditions from violating invariants when multiple threads try to put() and get() messages on a queue simultaneously.

353

10. Apply the Thread-Safe Interface pattern. In this sub-activity, the interface and implementation methods are implemented according to the Thread-Safe Interface pattern (345).

11. In our Message_Queue implementation two pairs of interface and implementation methods check if a queue is empty, which means it contains no messages, or full, which means it contains max_messages_. We show the interface methods first:

12. bool Message_Queue::empty () const { 13. Guard<Thread_Mutex> guard (monitor_lock_); 14. return empty_i (); 15. } 16. 17. bool Message_Queue::full () const { 18. Guard<Thread_Mutex> guard (monitor_lock_); 19. return full_i (); 20. }

21. These methods illustrate a simple example of the Thread-Safe Interface pattern (345). They use the Scoped Locking idiom (325) to acquire and release the monitor lock, then forward immediately to their corresponding implementation methods:

22. bool Message_Queue::empty_i () const { 23. return message_count_ == 0; 24. } 25. 26. bool Message_Queue::full_i () const { 27. return message_count_ == max_messages_; 28. }

29. In accordance with the Thread-Safe Interface pattern, these implementation methods assume the monitor_lock_ is held, so they just check for the boundary conditions in the queue.

30. The put() method inserts a new Message, which is a class defined in the Active Object pattern (369), at the tail of a queue. It is a synchronized method that illustrates a more sophisticated use of the Thread-Safe Interface pattern (345):

31. void Message_Queue::put (const Message &msg) { 32. // Use the Scoped Locking idiom to 33. // acquire/release the <monitor_lock_> upon 34. // entry/exit to the synchronized method. 35. Guard<Thread_Mutex> guard (monitor_lock_); 36. // Wait while the queue is full. 37. while (full_i ()) {

354

38. // Release <monitor_lock_> and suspend the 39. // calling thread waiting for space in the

queue. 40. // The <monitor_lock_> is reacquired 41. // automatically when <wait> returns. 42. not_full_.wait (); 43. } 44. 45. // Enqueue the <Message> at the tail. 46. put_i (msg); 47. 48. // Notify any thread waiting in <get> that 49. // the queue has at least one <Message>. 50. not_empty_.notify (); 51. 52. } // Destructor of <guard> releases

<monitor_lock_>.

53. Note how this public synchronized put() method only performs the synchronization and scheduling logic needed to serialize access to the monitor object and wait while the queue is full. Once there is room in the queue, put() forwards to the put_i() implementation method. This inserts the message into the queue and updates its book-keeping information. Moreover, the put_i() is not synchronized because the put() method never calls it without first acquiring the monitor_lock_. Likewise, the put_i() method need not check to see if the queue is full because it is not called as long as full_i() returns true.

54. The get() method removes the message at the front of the queue and returns it to the caller:

55. Message Message_Queue::get () { 56. // Use the Scoped Locking idiom to 57. // acquire/release the <monitor_lock_> upon 58. // entry/exit to the synchronized method. 59. Guard<Thread_Mutex> guard (monitor_lock_); 60. 61. // Wait while the queue is empty. 62. while (empty_i ()) { 63. // Release <monitor_lock_> and suspend the 64. // calling thread waiting for a new

<Message> to 65. // be put into the queue. The

<monitor_lock_> is 66. // reacquired automatically when <wait>

returns.

355

67. not_empty_.wait (); 68. } 69. 70. // Dequeue the first <Message> in the queue 71. // and update the <message_count_>. 72. Message m = get_i (); 73. // Notify any thread waiting in <put> that the 74. // queue has room for at least one <Message>. 75. not_full_.notify (); 76. return m; 77. 78. // Destructor of <guard> releases

<monitor_lock_>. 79. }

80. As before, note how the synchronized get() interface method performs the synchronization and scheduling logic, while forwarding the dequeueing functionality to the get_i() implementation method.

Example Resolved

Internally, our gateway contains instances of two classes, Supplier_Handler and Consumer_Handler. These act as local proxies [GoF95] [POSA1] for remote suppliers and consumers, respectively. Each Consumer_Handler contains a thread-safe Message_Queue object implemented using the Monitor Object pattern. This design decouples supplier handler and consumer handler threads so that they run concurrently and block independently. Moreover, by embedding and automating synchronization inside message queue monitor objects, we can protect their internal state from corruption, maintain invariants, and shield clients from low-level synchronization concerns.

The Consumer_Handler is defined below: class Consumer_Handler { public: // Constructor spawns a thread and calls <svc_run>. Consumer_Handler (); // Put <Message> into the queue monitor object, // blocking until there's room in the queue. void put (const Message &msg) { message_queue_.put (msg); } private: // Message queue implemented as a monitor object.

356

Message_Queue message_queue_; // Connection to the remote consumer. SOCK_Stream connection_; // Entry point to a distinct consumer handler thread. static void *svc_run (void *arg); };

Each Supplier_Handler runs in its own thread, receives messages from its remote supplier and routes the messages to the designated remote consumers. Routing is performed by inspecting an address field in each message, which is used as a key into a routing table that maps keys to Consumer_Handlers.

Each Consumer_Handler is responsible for receiving messages from suppliers via its put() method and storing each message in its Message_Queue monitor object: void Supplier_Handler::route_message (const Message &msg) { // Locate the appropriate <Consumer_Handler> based // on address information in the <Message>. Consumer_Handler *consumer_handler = routing_table_.find (msg.address ()); // Put <Message> into the <Consumer Handler>, which // stores it in its <Message Queue> monitor object. consumer_handler->put (msg); }

To process the messages placed into its message queue by Supplier_Handlers, each Consumer_Handler spawns a separate thread of control in its constructor using the Thread_Manager class defined in the Wrapper Facade pattern (47), as follows: Consumer_Handler::Consumer_Handler () { // Spawn a separate thread to get messages from the // message queue and send them to the remote consumer. Thread_Manager::instance ()->spawn (&svc_run, this);

357

}

This new Consumer_Handler thread executes the svc_run() entry point. This is a static method that retrieves routing messages placed into its message queue by Supplier_Handler threads and sends them over its TCP connection to the remote consumer: void *Consumer_Handler::svc_run (void *args) { Consumer_Handler *this_obj = static_cast<Consumer_Handler *> (args); for (;;) { // Blocks on <get> until next <Message> arrives. Message msg = this_obj->message_queue_.get (); // Transmit message to the consumer. this_obj->connection_.send (msg, msg.length ()); } }

The SOCK_Stream's send() method can block in a Consumer_Handler thread. It will not affect the quality of service of other Consumer_Handler or Supplier_Handler threads, because it does not share any data with the other threads. Similarly, Message_Queue::get() can block without affecting the quality of service of other threads, because the Message_Queue is a monitor object. Supplier_Handlers can thus insert new messages into the Consumer_Handler's Message_Queue via its put() method without blocking indefinitely.

Variants

Timed Synchronized Method Invocations. Certain applications require 'timed' synchronized method invocations. This feature allows them to set bounds on the time they are willing to wait for a synchronized method to enter its monitor object's critical section. The Balking pattern described in [Lea99a] can be implemented using timed synchronized method invocations.

The Message_Queue monitor object interface defined earlier can be modified to support timed synchronized method invocations: class Message_Queue { public: // Wait up to the <timeout> period to put <Message> // at the tail of the queue. void put (const Message &msg, Time_Value *timeout =0); // Wait up to the <timeout> period to get <Message> // from the head of the queue. Message get (Time_Value *timeout = 0); };

358

If timeout is 0 then both get() and put() will block indefinitely until a message is either inserted into or removed from a Message_Queue monitor object. If the time-out period is non-zero and it expires, the Timedout exception is thrown. The client must be prepared to handle this exception.

The following illustrates how the put() method can be implemented using the timed wait feature of the Thread_Condition condition variable wrapper outlined in implementation activity 3 (408): void Message_Queue::put (const Message &msg, Time_Value *timeout) /* throw (Timedout) */ { // ... Same as before ... while (full_i ()) not_full_.wait (timeout); // ... Same as before ... }

While the queue is full this 'timed' put() method releases monitor_lock_ and suspends the calling thread, to wait for space to become available in the queue or for the timeout period to elapse. The monitor_lock_ will be re-acquired automatically when wait() returns, regardless of whether a time-out occurred or not.

Strategized Locking. The Strategized Locking pattern (333) can be applied to make a monitor object implementation more flexible, efficient, reusable, and robust. Strategized Locking can be used, for example, to configure a monitor object with various types of monitor locks and monitor conditions.

The following template class uses generic programming techniques [Aus98] to parameterize the synchronization aspects of a Message_Queue: template <class SYNCH_STRATEGY> class Message_Queue { private: typename SYNCH_STRATEGY::Mutex monitor_lock_; typename SYNCH_STRATEGY::Condition not_empty_; typename SYNCH_STRATEGY::Condition not_full_; // ... };

Each synchronized method is then modified as shown by the following empty() method: template <class SYNCH_STRATEGY> bool Message_Queue<SYNCH_STRATEGY>::empty () const { Guard<SYNCH_STRATEGY::Mutex> guard (monitor_lock_) ; return empty_i ();

359

}

To parameterize the synchronization aspects associated with a Message_Queue, we can define a pair of classes, MT_Synch and NULL_SYNCH that typedef the appropriate C++ traits: class MT_Synch { public: // Synchronization traits. typedef Thread_Mutex Mutex; typedef Thread_Condition Condition; }; class Null_Synch { public: // Synchronization traits. typedef Null_Mutex Mutex; typedef Null_Thread_Condition Condition; };

To define a thread-safe Message_Queue, therefore, we simply parameterize it with the MT_Synch strategy: Message_Queue<MT_Synch> message_queue;

Similarly, to create a non-thread-safe Message_Queue, we can parameterize it with the following Null_Synch strategy: Message_Queue<Null_Synch> message_queue;

Note that when using the Strategized Locking pattern in C++ it may not be possible for a generic component class to know what type of synchronization strategy will be configured for a particular application. It is important therefore to apply the Thread-Safe Interface pattern (345) as described in implementation activity 4.2 (411), to ensure that intra-object method calls, such as put() calling full_i(), and put_i() , avoid self-deadlock and minimize locking overhead.

Multiple Roles. If a monitor object implements multiple roles, each of which is used by different types of clients, an interface can be introduced for each role. Applying the Extension Interface pattern (141) allows clients to obtain the interface they need. This design helps separate concerns, because a client only sees the particular methods of a monitor object it needs for its own operation. This design further simplifies a monitor object's evolution. For example, new services can be added to the active object by providing new extension interface without changing existing ones. Clients that do not need access to the new services are thus unaffected by the extension.

Known Uses

Dijkstra and Hoare-style Monitors. Dijkstra [Dij68] and Hoare [Hoare74] defined programming language features called monitors that encapsulate functions and their internal

360

variables into thread-safe modules. To prevent race conditions a monitor contains a lock that allows only one function at a time to be active within the monitor. Functions that want to leave the monitor temporarily can block on a condition variable. It is the responsibility of the programming language compiler to generate run-time code that implements and manages a monitor's lock and its condition variables.

Java Objects. The main synchronization mechanism in Java is based on Dijkstra/Hoare-style monitors. Each Java object can be a monitor object containing a monitor lock and a single monitor condition. Java's monitors are simple to use for common use cases, because they allow threads to serialize their execution implicitly via method-call interfaces and to coordinate their activities via calls to wait(), notify(), and notifyAll() methods defined on all objects.

For more complex use cases, however, the simplicity of the Java language constructs may mislead developers into thinking that concurrency is easier to program than it actually is in practice. In particular, heavy use of inter-dependent Java threads can yield complicated inter-relationships, starvation, deadlock, and overhead. [Lea99a] describes many patterns for handling simple and complex concurrency use cases in Java.

The Java language synchronization constructs outlined above can be implemented in several ways inside a compliant Java virtual machine (JVM). JVM implementors must choose between two implementation decisions: § Implement Java threads internally in the JVM. If threads are implemented internally,

the JVM appears as one monolithic task to the operating system. In this case, the JVM is free to decide when to suspend and resume threads and how to implement thread scheduling, as long as it stays within the bounds of the Java language specification.

§ Map Java threads them to native operating system threads. In this case Java monitors can take advantage of synchronization primitives and scheduling behavior of the underlying platform.

The advantage of an internal threads implementation is its platform-independence. However, one of its disadvantages is its inability to take advantage of parallelism in the hardware. As a result, an increasing number of JVMs are implemented by mapping Java threads to native operating system threads.

ACE Gateway. The example from the Example Resolved section is based on a communication gateway application contained in the ACE framework [Sch96], which uses monitor objects to simplify concurrent programming and improve performance on multiprocessors. Unlike the Dijkstra/Hoare and Java monitors, which are programming language features, the Message_Queues used by Consumer_Handlers in the gateway are reusable ACE C++ components implemented using the Monitor Object pattern. Although C++ does not support monitor objects directly as a language feature, ACE implements the Monitor Object pattern by applying other patterns and idioms, such as the Guarded Suspension pattern [Lea99a] and the Scoped Locking (325) idiom, as described in the Implementation section.

Fast food restaurant. A real-life example of the Monitor Object pattern occurs when ordering a meal at a busy fast food restaurant. Customers are the clients who wait to place their order with a cashier. Only one customer at a time interacts with a cashier. If the order cannot be serviced immediately, a customer temporarily steps aside so that other customers can place their orders. When the order is ready the customer re-enters at the front of the line and can pick up the meal from the cashier.

Consequences

The Monitor Object pattern provides two benefits:

361

Simplification of concurrency control. The Monitor Object pattern presents a concise programming model for sharing an object among cooperating threads. For example, object synchronization corresponds to method invocations. Similarly clients need not be concerned with concurrency control when invoking methods on a monitor object. It is relatively straightforward to create a monitor object out of most so-called passive objects, which are objects that borrow the thread of control of its caller to execute its methods.

Simplification of scheduling method execution. Synchronized methods use their monitor conditions to determine the circumstances under which they should suspend or resume their execution and that of collaborating monitor objects. For example, methods can suspend themselves and wait to be notified when arbitrarily complex conditions occur, without using inefficient polling. This feature makes it possible for monitor objects to schedule their methods cooperatively in separate threads.

The Monitor Object pattern has the following four liabilities:

The use of a single monitor lock can limit scalability due to increased contention when multiple threads serialize on a monitor object.

Complicated extensibility semantics resulting from the coupling between a monitor object's functionality and its synchronization mechanisms. It is relatively straightforward to decouple an active object's (369) functionality from its synchronization policies via its separate scheduler participant. However, a monitor object's synchronization and scheduling logic is often tightly coupled with its methods' functionality. This coupling often makes monitor objects more efficient than active objects. Yet it also makes it hard to change their synchronization policies or mechanisms without modifying the monitor object's method implementations.

It is also hard to inherit from a monitor object transparently, due to the inheritance anomaly problem [MWY91]. This problem inhibits reuse of synchronized method implementations when subclasses require different synchronization mechanisms. One way to reduce the coupling of synchronization and functionality in monitor objects is to use Aspect-Oriented Programming [KLM+97] or the Strategized Locking (333) and Thread-Safe Interface (345) patterns, as shown in the Implementation and Variants section.

Nested monitor lockout. This problem is similar to the preceding liability. It can occur when a monitor object is nested within another monitor object.

Consider the following two Java classes: class Inner { protected boolean cond_ = false; public synchronized void awaitCondition () { While (!cond) try { wait (); } catch (InterruptedException e) { } // Any other code. } public synchronized void notifyCondition (boolean c){ cond_ = c;

362

notifyAll () ; } class Outer { protected Inner inner_ = new Inner () ; public synchronized void process () { inner_.awaitCondition () ; } public synchronized void set (boolean c) { inner_.notifyCondition (c) ; } }

This code illustrates the canonical form of the nested monitor lockout problem in Java [JS97a]. When a Java thread blocks in the monitor's wait queue, all its locks are held except the lock of the object placed in the queue.

Consider what would happen if thread T1 made a call to Outer.process() and as a result blocked in the wait() call in Inner.awaitCondition(). In Java, the Inner and Outer classes do not share their monitor locks. The wait() statement in waitCondition() call would therefore release the Inner monitor while retaining the Outer monitor. Another thread T2 cannot then acquire the Outer monitor, because it is locked by the synchronized process() method. As a result Outer.set cannot set Inner.cond_ to true and T1 will continue to block in wait() forever.

Nested monitor lockout can be avoided by sharing a monitor lock between multiple monitor conditions. This is straightforward in Monitor Object pattern implementations based on POSIX condition variables [IEEE96]. It is surprisingly hard in Java due to its simple concurrency and synchronization model, which tightly couples a monitor lock with each monitor object. Java idioms for avoiding nested monitor lockout in Java are described in [Lea99a] [JS97a].

See Also

The Monitor Object pattern is an object-oriented analog of the Code Locking pattern [McK95], which ensures that a region of code is serialized. In the Monitor Object pattern, the region of code is the synchronized method implementation.

The Monitor Object pattern has several properties in common with the Active Object pattern (369). Both patterns can synchronize and schedule methods invoked concurrently on objects, for example. There are two key differences, however: § An active object executes its methods in a different thread than its client(s), whereas a

monitor object executes its methods in its client threads. As a result, active objects can perform more sophisticated, albeit more expensive, scheduling to rearrange the order in which their methods execute.

§ Monitor objects often couple their synchronization logic more closely to their methods' functionality. In contrast, it is easier to decouple an active object's functionality from its synchronization policies, because it has a separate scheduler.

363

It is instructive to compare the Monitor Object pattern solution in the Example Resolved section with the solution presented in the Active Object pattern. Both solutions have similar overall application architectures. In particular, the Supplier_Handler and Consumer_Handler implementations are almost identical. The primary difference is that the Message_Queue itself is easier to program and often more efficient when implemented using the Monitor Object pattern than the Active Object pattern.

If a more sophisticated queueing strategy is necessary, however, the Active Object pattern may be more appropriate. Similarly, because active objects execute in different threads than their clients, there are situations where active objects can improve overall application concurrency by executing multiple operations asynchronously. When these operations complete, clients can obtain their results via futures [Hal85] [LS88].

[4]For an in-depth discussion of the gateway and its associated components, we recommend reading the Active Object pattern (369) before reading this pattern.

[5]An active object, in contrast, does have its own thread of control.

Half-Sync/Half-Async The Half-Sync/Half-Async architectural pattern decouples asynchronous and synchronous service processing in concurrent systems, to simplify programming without unduly reducing performance. The pattern introduces two intercommunicating layers, one for asynchronous and one for synchronous service processing.

Example

Performance-sensitive concurrent applications, such as telecommunications switching systems and avionics mission computers, perform a mixture of synchronous and asynchronous processing to coordinate different types of applications, system services, and hardware. Similar characteristics hold for system-level software, such as operating systems.

The BSD UNIX operating system [MBKQ96] [Ste98] is an example of a concurrent system that coordinates the communication between standard Internet application services, such as FTP, INETD, DNS, TELNET, SMTP, and HTTPD, and hardware I/O devices, such as network interfaces, disk controllers, end-user terminals, and printers.

The BSD UNIX operating system processes certain services asynchronously to maximize performance. Protocol processing within the BSD UNIX kernel, for example, runs asynchronously, because I/O devices are driven by interrupts triggered by network interface hardware. If the kernel does not handle these asynchronous interrupts immediately, hardware devices may malfunction and drop packets or corrupt memory buffers.

364

Although the BSD operating system kernel is driven by asynchronous interrupts, it is hard to develop applications and higher-level system services using asynchrony mechanisms, such as interrupts or signals. In particular, the effort required to program, validate, debug, and maintain asynchronous programs can be prohibitive. For example, asynchrony can cause subtle timing problems and race conditions when an interrupt preempts a running computation unexpectedly.

To avoid the complexities of asynchronous programming, higher-level services in BSD UNIX run synchronously in multiple processes. For example, FTP or TELNET Internet services that use synchronous read() and write() system calls can block awaiting the completion of I/O operations. Blocking I/O, in turn, enables developers to maintain state information and execution history implicitly in the run-time stacks of their threads, rather than in separate data structures that must be managed explicitly by developers.

Within the context of an operating system, however, synchronous and asynchronous processing is not wholly independent. In particular, application-level Internet services that execute synchronously within BSD UNIX must cooperate with kernel-level protocol processing that runs asynchronously. For example, the synchronous read() system call invoked by an HTTP server cooperates indirectly with the asynchronous reception and protocol processing of data arriving on the Ethernet network interface.

A key challenge in the development of BSD UNIX was the structuring of asynchronous and synchronous processing, to enhance both programming simplicity and system performance. In particular, developers of synchronous application programs must be shielded from the complex details of asynchronous programming. Yet, the overall performance of the system must not be degraded by using inefficient synchronous processing mechanisms in the BSD UNIX kernel.

Context

A concurrent system that performs both asynchronous and synchronous processing services that must intercommunicate.

Problem

Concurrent systems often contain a mixture of asynchronous and synchronous processing services. There is a strong incentive for system programmers to use asynchrony to improve performance. Asynchronous programs are generally more efficient, because services can be mapped directly onto asynchrony mechanisms, such as hardware interrupt handlers or software signal handlers.

Conversely, there is a strong incentive for application developers to use synchronous processing to simplify their programming effort. Synchronous programs are usually less complex, because certain services can be constrained to run at well-defined points in the processing sequence.

Two forces must therefore be resolved when specifying a software architecture that executes services both synchronously and asynchronously: § The architecture should be designed so that application developers who want the

simplicity of synchronous processing need not address the complexities of asynchrony. Similarly, system developers who must maximize performance should not need to address the inefficiencies of synchronous processing.

§ The architecture should enable the synchronous and asynchronous processing services to communicate without complicating their programming model or unduly degrading their performance.

365

Although the need for both programming simplicity and high performance may seem contradictory, it is essential that both these forces be resolved in certain types of concurrent systems, particularly large-scale or complex ones.

Solution

Decompose the services in the system into two layers [POSA1], synchronous and asynchronous, and add a queueing layer between them to mediate the communication between services in the asynchronous and synchronous layers.

In detail: process higher-layer services, such as long-duration database queries or file transfers, synchronously in separate threads or processes, to simplify concurrent programming. Conversely, process lower-layer services, such as short-lived protocol handlers driven by interrupts from network interface hardware, asynchronously to enhance performance. If services residing in separate synchronous and asynchronous layers must communicate or synchronize their processing, allow them to pass messages to each other via a queueing layer.

Structure

The structure of the Half-Sync/Half-Async pattern follows the Layers pattern [POSA1] and includes four participants:

The synchronous service layer performs high-level processing services. Services in the synchronous layer run in separate threads or processes that can block while performing operations.

The Internet services in our operating system example run in separate application processes. These processes invoke read() and write() operations to perform I/O synchronously on behalf of their Internet services.

The asynchronous service layer performs lower-level processing services, which typically emanate from one or more external event sources. Services in the asynchronous layer cannot block while performing operations without unduly degrading the performance of other services.

The processing of I/O devices and protocols in the BSD UNIX operating system kernel is performed asynchronously in interrupt handlers. These handlers run to completion, that is, they do not block or synchronize their execution with other threads until they are finished.

366

The queueing layer provides the mechanism for communicating between services in the synchronous and asynchronous layers. For example, messages containing data and control information are produced by asynchronous services, then buffered at the queueing layer for subsequent retrieval by synchronous services, and vice versa. The queueing layer is responsible for notifying services in one layer when messages are passed to them from the other layer. The queueing layer therefore enables the asynchronous and synchronous layers to interact in a 'producer/consumer' manner, similar to the structure defined by the Pipes and Filters pattern [POSA1].

The BSD UNIX operating system provides a Socket layer [Ste98]. This layer serves as the buffering and notification point between the synchronous Internet service application processes and the asynchronous, interrupt-driven I/O hardware services in the BSD UNIX kernel.

External event sources generate events that are received and processed by the asynchronous service layer. Common sources of external events for operating systems include network interfaces, disk controllers, and end-user terminals.

The following class diagram illustrates the structure and relationships between these participants:

Dynamics

Asynchronous and synchronous layers in the Half-Sync/Half-Async pattern interact by passing messages via a queueing layer. We describe three phases of interactions that occur when input arrives 'bottom-up' from external event sources: § Asynchronous phase. In this phase external sources of input interact with the

asynchronous service layer via an asynchronous event notification, such as an interrupt or signal. When asynchronous services have finished processing the input, they can communicate their results to the designated services in the synchronous layer via the queueing layer.

367

§ Queueing phase. In this phase the queueing layer buffers input passed from the asynchronous layer to the synchronous layer and notifies the synchronous layer that input is available.

§ Synchronous phase. In this phase the appropriate service(s) in the synchronous layer retrieve and process the input placed into the queueing layer by service(s) in the asynchronous layer.

The interactions between layers and pattern participants is reversed to form a 'top-down' sequence when output arrives from services running in the synchronous layer.

Implementation

This section describes the activities used to implement the Half-Sync/Half-Async pattern and apply it to structure the concurrency architecture of higher-level applications, such as Web servers [Sch97] and database servers, as well as to lower-level systems, such as the BSD UNIX operating system. We therefore present examples from several different domains.

1. Decompose the overall system into three layers: synchronous, asynchronous, and queueing. Three sub-activities can be used to determine how to decompose a system architecture designed in accordance with the Half-Sync/Half-Async pattern.

1. Identify higher-level and/or long-duration services and configure them into the synchronous layer. Many services in a concurrent system are easier to implement when they are programmed using synchronous processing. These services often perform relatively high-level or long-duration application processing, such as transferring large streams of content in a Web server or performing complex queries in a database. Services in the synchronous layer should therefore run in separate processes or threads. If data is not available the services can block at the queueing layer awaiting responses, under the control of peer-to-peer application communication protocols.

2. Each Internet service shown in our BSD UNIX operating system example runs in a separate application process. Each application process communicates with its clients using the protocol associated with the Internet service it implements. I/O operations within these processes can be performed by blocking synchronously on TCP Sockets and waiting for the BSD UNIX kernel to complete the I/O operations asynchronously.

368

4. Identify lower-level and/or short-duration services and configure them into the asynchronous layer. Certain services in a system cannot block for prolonged amounts of time. Such services typically perform lower-level or short-duration system processing that interacts with external sources of events, such as end-user terminals or interrupt-driven hardware network interfaces. To maximize responsiveness and efficiency, these sources of events must be handled rapidly and must not block the thread that services them. Their services should be triggered by asynchronous notifications or interrupts from external event sources and run to completion, at which point they can insert messages containing their results into the queueing layer.

5. In our operating system example, processing of I/O device drivers and communication protocols in the BSD UNIX kernel occurs in response to asynchronous hardware interrupts. Each asynchronous operation in the kernel runs to completion, inserting messages containing data and/or control information into the Socket layer if it must communicate with an application process running an Internet service in the synchronous layer.

7. Identify inter-layer communication strategies and configure them into the queueing layer. The queueing layer is a mediator [GoF95] that decouples the communication between services in the asynchronous and synchronous layers. Thus these services do not access each other directly, but only via the queueing layer. The communication-related strategies performed by the queueing layer involve (de)multiplexing, buffering, notification, and flow control. Services in the asynchronous and synchronous layers use these queueing strategies to implement protocols for passing messages between the synchronous and asynchronous layers [SC96].

8. In our BSD UNIX operating system example, the Sockets mechanism [Ste98] defines the queueing layer between the synchronous Internet service application processes and the asynchronous operating system kernel. Each Internet service uses one or more Sockets, which are queues maintained by BSD UNIX to buffer messages exchanged between application processes, and the TCP/IP protocol stack and networking hardware devices in the kernel.

2. Implement the services in the synchronous layer. High-level and/or long-duration services in the synchronous layer are often implemented using either multi-threading or multi-processing. Compared to a thread, a process maintains more state information and requires more overhead to spawn, synchronize, schedule, and inter-communicate. Implementing synchronous services in separate threads, rather than separate processes, can therefore yield simpler and more efficient applications.

Multi-threading can reduce application robustness, however, because separate threads within a process are not protected from one another. For instance, one faulty thread can corrupt data shared with other threads in the process, which may produce incorrect results, crash the process, or cause the process to hang indefinitely. To

369

increase robustness, therefore, application services can be implemented in separate processes.

The Internet services in our BSD UNIX example are implemented in separate processes. This design increases their robustness and prevents unauthorized access to certain resources, such as files owned by other users.

3. Implement the services in the asynchronous layer. Lower-level and/or shorter-duration services in the asynchronous layer often do not have their own dedicated thread of control. Instead, they must borrow a thread from elsewhere, such as the operating system kernel's 'idle thread' or a separate interrupt stack. To ensure adequate response time for other system services, such as high-priority hardware interrupts, these services must run asynchronously and cannot block for long periods of time.

The following are two strategies that can be used to trigger the execution of asynchronous services: § Asynchronous interrupts. This strategy is often used when developing

asynchronous services that are triggered directly by hardware interrupts from external event sources, such as network interfaces or disk controllers. In this strategy, when an event occurs on an external event source, an interrupt notifies the handler associated with the event, which then processes the event to completion.

In complex concurrent systems, it may be necessary to define a hierarchy of interrupts to allow less critical handlers to be preempted by higher-priority ones. To prevent interrupt handlers from corrupting shared state while they are being accessed, data structures used by the asynchronous layer must be protected, for example by raising the interrupt priority [WS95].

The BSD UNIX kernel uses a two-level interrupt scheme to handle network packet processing [MBKQ96]. Time-critical processing is done at a high priority and less critical software processing is done at a lower priority. This two-level interrupt scheme prevents the overhead of software protocol processing from delaying the servicing of high-priority hardware interrupts.

§ Proactive I/O. This strategy is often used when developing asynchronous services based on higher-level operating system APIs, such as the Windows NT overlapped I/O and I/O completion ports [So198] or the POSIX aio_* family of asynchronous I/O system calls [POSIX95]. In this strategy, I/O operations are executed by an asynchronous operation processor. When an asynchronous operation finishes, the asynchronous operation processor generates a completion event. This event is then dispatched to the handler associated with the event, which processes the event to completion.

§ For example, the Web server in the Proactor pattern (215) illustrates an application that uses the proactive I/O mechanisms defined by the Windows NT system call API. This example underscores the fact that asynchronous

370

processing and the Half-Sync/Half-Async pattern can be used for higher-level applications that do not access hardware devices directly.

Both of these asynchronous processing strategies share the constraint that a handler cannot block for a long period of time without disrupting the processing of events from other external event sources.

4. Implement the queueing layer. After services in the asynchronous layer finish processing input arriving from external event sources, they typically insert the resulting messages into the queueing layer. The appropriate service in the synchronous layer will subsequently remove these messages from the queueing layer and process them. These roles are reversed for output processing. Two communication-related strategies must be defined when implementing the queueing layer:

0. Implement the buffering strategy. Services in the asynchronous and synchronous layers do not access each other's memory directly—instead, they exchange messages via a queueing layer. This queueing layer buffers messages so that synchronous and asynchronous services can run concurrently, rather than running in lockstep via a 'stop-and-wait' flow control protocol. The buffering strategy must therefore implement an ordering, serialization, notification, and flow-control strategy. Note that the Strategy pattern [GoF95] can be applied to simplify the configuration of alternative strategies. § Implement the ordering strategy. Simple queueing layers store their

messages in the order they arrive, that is, 'first-in, first-out' (FIFO). The first message that was placed in the queue by a service in one layer is thus the first message to be removed by a service in the other layer. FIFO ordering is easy to implement, but may result in priority inversions [SMFG00] if high-priority messages are queued behind lower-priority messages. Therefore, more sophisticated queueing strategies can be used to store and retrieve messages in 'priority' order.

§ Implement the serialization strategy. Services in the asynchronous and synchronous layer can execute concurrently. A queue must therefore be serialized to avoid race conditions when messages are inserted and removed concurrently. This serialization is often implemented using lightweight synchronization mechanisms, such as mutexes [Lew95]. Such mechanisms ensure that messages can be inserted into and removed from the queueing layer's message buffers without corrupting its internal data structures.

§ Implement the notification strategy. It may be necessary to notify a service in one layer when messages addressed to it arrive from another layer. The notification strategy provided by the queueing layer is often implemented using more sophisticated and heavyweight synchronization mechanisms, such as semaphores or condition variables [Lew95]. These synchronization mechanisms can notify the appropriate services in the synchronous or asynchronous layers when data arrives for them in the queueing layer. The Variations section outlines several other notification strategies based on asynchronous signals and interrupts.

§ Implement the flow-control strategy. Systems cannot devote an unlimited amount of resource to buffer messages in the queueing layer. It may therefore be necessary to regulate the amount of data passed between the synchronous and asynchronous layers. Flow control is a technique that prevents synchronous services from flooding the

371

asynchronous layer at a rate greater than that at which messages can be transmitted and queued on network interfaces [SchSu93].

Services in the synchronous layer can block. A common flow control policy simply puts a synchronous service to sleep if it produces and queues more than a certain number of messages. After the asynchronous service layer empties the queue to below a certain level, the queueing layer can awaken the synchronous service to continue its processing.

In contrast, services in the asynchronous layer cannot block. If they can produce an excessive number of messages, a common flow-control policy allows the queueing layer to discard messages until the synchronous service layer finishes processing the messages in its queue. If the messages are associated with a reliable connection-oriented transport protocol, such as TCP [Ste93], senders will time-out eventually and retransmit discarded messages.

1. Implement the (de)multiplexing mechanism. In simple implementations of the Half-Sync/Half-Async pattern, such as the OLTP servers described in the Example section of the Leader/Followers pattern (447), there is only one queue in the queueing layer. This queue is shared by all services in the asynchronous and synchronous layers and any service can process any request. This configuration alleviates the need for a sophisticated (de)multiplexing mechanism. In this case, a common implementation is to define a singleton [GoF95] queue that all services use to insert and remove messages.

In more complex implementations of the Half-Sync/Half-Async pattern, services in one layer may need to send and receive certain messages to particular services in another layer. A queueing layer may therefore need multiple queues, for example one queue per service. With multiple queues, more sophisticated demultiplexing mechanism are needed to ensure messages exchanged between services in different layers are placed in the appropriate queue. A common implementation is to use some type of (de)multiplexing mechanism, such as a hash table [HMPT89] [MD91], to place messages into the appropriate queue(s).

The Message_Queue components defined in the Monitor Object (399) and Active Object (369) patterns illustrate various strategies for implementing a queueing layer: § The Monitor Object pattern ensures that only one method at a time

executes within a queue, regardless of the number of threads that invoke the queue's methods concurrently, by using mutexes and condition variables. The queue executes its methods in its client threads, that is, in the threads that run the synchronous and asynchronous services.

§ The Active Object pattern decouples method invocations on the queue from method execution. Multiple synchronous and asynchronous services can therefore invoke methods on the queue concurrently. Methods are executed in a different thread than the threads that run the synchronous and asynchronous services.

The See Also sections of the Active Object (369) and Monitor Object (399) patterns discuss the pros and cons of using these patterns to implement a queueing layer.

372

Example Resolved

Chapter 1, Concurrent and Networked Objects, and other patterns in this book, such as Proactor (215), Scoped Locking (325), Strategized Locking (333), and Thread-Safe Interface (345), illustrate various aspects of the design of a Web server application. In this section, we explore the broader system context in which Web servers execute, by outlining how the BSD UNIX operating system [MBKQ96] [Ste93] applies the Half-Sync/Half-Async pattern to receive an HTTP GET request via its TCP/IP protocol stack over Ethernet.

BSD UNIX is an example of an operating system that does not support asynchronous I/O efficiently. It is therefore not feasible to implement the Web server using the Proactor pattern (215). We instead outline how BSD UNIX coordinates the services and communication between synchronous application processes and the asynchronous operating system kernel.

In particular, we describe:[6] § The synchronous invocation of a read() system call by a Web server application (the

HTTPD process). § The asynchronous reception and protocol processing of data arriving on the Ethernet

network interface. § The synchronous completion of the read() call, which returns control and the GET

request data back to the HTTPD process.

These steps are shown in the following figure:

As shown in this figure, the HTTPD process invokes a read() system call on a connected socket handle to receive an HTTP GET request encapsulated in a TCP packet. From the perspective of the HTTPD process, the read() system call is synchronous, because the process invokes read() and blocks until the GET request data is returned. If data is not available immediately, however, the BSD UNIX kernel puts the HTTPD process to sleep until the data arrives from the network.

Many asynchronous steps occur to implement the synchronous read() system call, however. Although the HTTPD process can sleep while waiting for data, the BSD UNIX

373

kernel cannot sleep, because other application processes, such as the FTP and TELNET services and I/O devices in the kernel, require its services to run concurrently and efficiently.

After the read() system call is issued the application process switches to 'kernel mode' and starts running privileged instructions, which direct it synchronously into the BSD UNIX networking subsystem. Ultimately, the thread of control from the application process ends in the kernel's soreceive() function. This function processes input for various types of sockets, such as datagram sockets and stream sockets, by transferring data from the socket queue to the application process. The soreceive() function thus defines the boundary between the synchronous application process layer and the asynchronous kernel layer for outgoing packets.

There are two ways in which the HTTPD process's read() system call can be handled by soreceive(), depending on the characteristics of the Socket and the amount of data in the socket queue: § Completely synchronous. If the data requested by the HTTPD process is in the socket

queue, the soreceive() function can copy it immediately and the read() system call will complete synchronously.

§ Half-synchronous and half-asynchronous. If the data requested by the HTTPD process is not yet available, the kernel calls the sbwait() function to put the process to sleep until the requested data arrives.

After sbwait() puts the process to sleep, the BSD UNIX scheduler will switch to another process context that is ready to run. From the perspective of the HTTPD process, however, the read() system call appears to execute synchronously. When packet(s) containing the requested data arrive, the kernel will process them asynchronously, as described below. When enough data has been placed in the socket queue to satisfy the HTTPD process' request, the kernel will wake this process and complete its read() system call. This call then returns synchronously so that the HTTPD process can parse and execute the GET request.

To maximize performance within the BSD UNIX kernel, all protocol processing is executed asynchronously, because I/O devices are driven by hardware interrupts. For example, packets arriving at the Ethernet network interface are delivered to the kernel via interrupt handlers initiated asynchronously by the Ethernet hardware. These handlers receive packets from devices and trigger subsequent asynchronous processing of higher-layer protocols, such as IP and TCP. Ultimately, valid packets containing application data are queued at the Socket layer, where the BSD UNIX kernel schedules and dispatches the waiting HTTPD process to consume this data synchronously.

For example, the 'half-async' processing associated with an HTTPD process's read() system call starts when a packet arrives at an Ethernet network interface, which triggers an asynchronous hardware interrupt. All incoming packet processing is performed in the context of an interrupt handler. During an interrupt, the BSD UNIX kernel cannot sleep or block, because there is no application process context and no dedicated thread of control. The Ethernet interrupt handler therefore 'borrows' the kernel's thread of control. Similarly, the BSD UNIX kernel borrows the threads of control of application processes when they invoke system calls.

If the packet is destined for an application process, it is passed up to the transport layer, which performs additional protocol processing, such as TCP segment reassembly and acknowledgments. Eventually, the transport layer appends the data to the receive socket queue and calls sbwakeup(), which represents the boundary between the asynchronous and synchronous layers for incoming packets. This call wakes up the HTTPD process that was sleeping in soreceive() waiting for data on that socket queue. If all the data

374

requested by the HTTPD process has arrived, soreceive() will copy it to the buffer supplied by HTTPD, allowing the system call to return control to the Web server. The read() call thus appears to be synchronous from the perspective of the HTTPD process, even though asynchronous processing and context switching were performed while this process was asleep.

Variants

Asynchronous Control with Synchronous Data I/O. The HTTPD Web server described in the Implementation section 'pulls' messages synchronously from the queueing layer at its discretion, thereby combining control and data activities. On some operating system platforms, however, it is possible to decouple control and data so that services in the synchronous layer can be notified asynchronously when messages are inserted into the queueing layer. The primary benefit of this variant is that higher-level 'synchronous' services may be more responsive, because they can be notified asynchronously.

The UNIX signal-driven I/O mechanism [Ste98] implements this variant of the Half-Sync/Half-Async pattern. The UNIX kernel uses the SIGIO signal to 'push' control to a higher-level application process when data arrives on one of its Sockets. When a process receives this control notification asynchronously, it can then 'pull' the data synchronously from socket queueing layer via read().

The disadvantage of using asynchronous control, of course, is that developers of higher-level services must now face many of the asynchrony complexities outlined in the Problem section.

Half-Async/Half-Async. This variant extends the previous variant by propagating asynchronous control notifications and data operations all the way up to higher-level services in the 'synchronous' layer. These higher-level services may therefore be able to take advantage of the efficiency of the lower-level asynchrony mechanisms.

For example, the real-time signal interface defined in the POSIX real-time programming specification [POSIX95] supports this variant. In particular, a buffer pointer can be passed to the signal handler function dispatched by the operating system when a real-time signal occurs. Windows NT supports a similar mechanism using overlapped I/O and I/O completion ports [Sol98]. In this case, when an asynchronous operation completes, its associated overlapped I/O structure indicates which operation has completed and passes any data along. The Proactor pattern (215) and Asynchronous Completion Token pattern (261) describe how to structure applications to take advantage of asynchronous operations and overlapped I/O.

The disadvantage of this variant is similar to that of the previous variant. If most or all services can be driven by asynchronous operations, the design may be modeled better by applying the Proactor pattern (215) rather than the Half-Sync/Half-Async pattern.

Half-Sync/Half-Sync. This variant provides synchronous processing to lower-level services. If the asynchronous layer is multi-threaded, its services can run autonomously and use the queueing layer to pass messages to the synchronous service layer. The benefits of this

375

variant are that services in the asynchronous layer may be simplified, because they can block without affecting other services in this layer.

Microkernel operating systems, such as Mach [B190] or Amoeba [Tan95], typically use this variant. The microkernel runs as a separate multi-threaded 'process' that exchanges messages with application processes. Similarly, multi-threaded operating system macrokernels, such as Solaris [EKBF+92], can support multiple synchronous I/O operations in the kernel.

Multi-threading the kernel can be used to implement polled interrupts, which reduce the amount of context switching for high-performance continuous media systems by dedicating a kernel thread to poll a field in shared memory at regular intervals [CP95]. In contrast, single-threaded operating system kernels, such as BSD UNIX, restrict lower-level kernel services to use asynchronous I/O and only support synchronous multi-programming for higher-level application processes.

The drawback to providing synchronous processing to lower-level services, of course, is that it may increase overhead, thereby degrading overall system performance significantly.

Half-Sync/Half-Reactive. In object-oriented applications, the Half-Sync/Half-Async pattern can be implemented as a composite architectural pattern that combines the Reactor pattern (179) with the Thread Pool variant of the Active Object pattern (369). In this common variant, the reactor's event handlers constitute the services in the 'asynchronous' layer[7] and the queueing layer can be implemented by an active object's activation list. The servants dispatched by the scheduler in the active object's thread pool constitute the services in the synchronous layer. The primary benefit of this variant is the simplification it affords. This simplicity is achieved by performing event demultiplexing and dispatching in a single-threaded reactor that is decoupled from the concurrent processing of events in the active object's thread pool.

The OLTP servers described in the Example section of the Leader/Followers pattern (447) apply this variant. The 'asynchronous' service layer uses the Reactor pattern (179) to demultiplex transaction requests from multiple clients and dispatch event handlers. The handlers insert requests into the queueing layer, which is an activation list implemented using the Monitor Object pattern (399). Similarly, the synchronous service layer uses the thread pool variant of the Active Object pattern (369) to disseminate requests from the activation list to a pool of worker threads that service transaction requests from clients. Each thread in the active object's thread pool can block synchronously because it has its own run-time stack.

The drawback with this variant is that the queueing layer incurs additional context switching, synchronization, data allocation, and data copying overhead that may be unnecessary for certain applications. In such cases the Leader/Followers pattern (447) may be a more efficient, predictable, and scalable way to structure a concurrent application than the Half-Sync/Half-Async pattern.

Known Uses

376

UNIX Networking Subsystems. The BSD UNIX networking subsystem [MBKQ96] and the UNIX STREAMS communication framework [Ris98] use the Half-Sync/Half-Async pattern to structure the concurrent I/O architecture of application processes and the operating system kernel. I/O in these kernels is asynchronous and triggered by interrupts. The queueing layer is implemented by the Socket layer in BSD UNIX [Ste98] and by Stream Heads in UNIX STREAMS [Rago93]. I/O for application processes is synchronous.

Most UNIX network daemons, such as TELNETD and FTPD, are developed as application processes that invoke read() and write() system calls synchronously [Ste98]. This design shields application developers from the complexity of asynchronous I/O processed by the kernel. However, there are hybrid mechanisms, such as the UNIX SIGIO signal, that can be used to trigger synchronous I/O processing via asynchronous control notifications.

CORBA ORBs. MT-Orbix [Bak97] uses a variation of the Half-Sync/Half-Async pattern to dispatch CORBA remote operations in a concurrent server. In MT-Orbix's ORB Core a separate thread is associated with each socket handle that is connected to a client. Each thread blocks synchronously, reading CORBA requests from the client. When a request is received it is demultiplexed and inserted into the queueing layer. An active object thread in the synchronous layer then wakes up, dequeues the request, and processes it to completion by performing an upcall to the CORBA servant.

ACE. The ACE framework [Sch97] applies the 'Half-Sync/Half-Reactive' variant of the Half-Sync/Half-Async pattern in an application-level gateway that routes messages between peers in a distributed system [Sch96]. The ACE_Reactor is the ACE implementation of the Reactor pattern (179) that demultiplexes indication events to their associated event handlers in the 'asynchronous' layer. The ACE Message_Queue class implements the queueing layer, while the ACE Task class implements the thread pool variant of the Active Object pattern (369) in the synchronous service layer.

Conduit. The Conduit communication framework [Zweig90] from the Choices operating system project [CIRM93] implements an object-oriented version of the Half-Sync/Half-Async pattern. Application processes are synchronous active objects, an Adapter Conduit serves as the queueing layer, and the Conduit micro-kernel operates asynchronously, communicating with hardware devices via interrupts.

Restaurants. Many restaurants use a variant of the Half-Sync/Half-Async pattern. For example, restaurants often employ a host or hostess who is responsible for greeting patrons and keeping track of the order in which they will be seated if the restaurant is busy and it is necessary to queue them waiting for an available table. The host or hostess is 'shared' by all the patrons and thus cannot spend much time with any given party. After patrons are seated at a table, a waiter or waitress is dedicated to service that table.

Consequences

The Half-Sync/Half-Async pattern has the following benefits:

Simplification and performance. The programming of higher-level synchronous processing services are simplified without degrading the performance of lower-level system services. Concurrent systems often have a greater number and variety of high-level processing services than lower-level services. Decoupling higher-level synchronous services from lower-level asynchronous processing services can simplify application programming, because complex concurrency control, interrupt handling, and timing services can be localized within the asynchronous service layer. The asynchronous layer can also handle low-level details that may be hard for application developers to program robustly, such as interrupt handling. In addition, the asynchronous layer can manage the interaction with

377

hardware-specific components, such as DMA, memory management, and I/O device registers.

The use of synchronous I/O can also simplify programming, and may improve performance on multi-processor platforms. For example, long-duration data transfers, such as downloading a large medical image from a hierarchical storage management system [PHS96], can be simplified and performed efficiently using synchronous I/O. In particular, one processor can be dedicated to the thread that is transferring the data. This enables the instruction and data cache of that CPU to be associated with the entire image transfer operation.

Separation of concerns. Synchronization policies in each layer are decoupled. Each layer therefore need not use the same concurrency control strategies. In the single-threaded BSD UNIX kernel, for example, the asynchronous service layer implements synchronization via low-level mechanisms, such as raising and lowering CPU interrupt levels. In contrast, application processes in the synchronous service layer implement synchronization via higher-level mechanisms, such as monitor objects (399) and synchronized message queues.

Legacy libraries, such as X Windows and older RPC toolkits, are often not re-entrant. Multiple threads of control cannot therefore invoke these library functions concurrently within incurring race conditions. To improve performance or to take advantage of multiple CPUs, however, it may be necessary to perform bulk data transfers or database queries in separate threads. In this case, the Half-Sync/Half-Reactive variant of the Half-Sync/Half-Async pattern can be applied to decouple the single-threaded portions of an application from its multi-threaded portions.

For example, an application's X Windows GUI processing could run under the control of a reactor. Similarly, long data transfers could run under the control of an active object thread pool. By decoupling the synchronization policies in each layer of the application via the Half-Sync/Half-Async pattern, non-re-entrant functions can continue to work correctly without requiring changes to existing code.

Centralization of inter-layer communication. Inter-layer communication is centralized at a single access point, because all interaction is mediated by the queueing layer. The queueing layer buffers messages passed between the other two layers. This eliminates the complexities of locking and serialization that would otherwise be necessary if the synchronous and asynchronous service layers accessed objects in each other's memory directly.

The Half-Sync/Half-Async pattern also has the following liabilities:

A boundary-crossing penalty may be incurred from context switching, synchronization, and data copying overhead when data is transferred between the synchronous and asynchronous service layers via the queueing layer. For example, most operating systems implement the Half-Sync/Half-Async pattern by placing the queueing layer at the boundary between the user-level and kernel-level protection domains. A significant performance penalty can be incurred when crossing this boundary [HP91].

One way of reducing this overhead is to share a region of memory between the synchronous service layer and the asynchronous service layer [DP93]. This 'zero-copy' design allows the two layers to exchange data directly, without copying data into and out of the queueing layer.

378

[CP95] presents a set of extensions to the BSD UNIX I/O subsystem that minimizes boundary-crossing penalties by using polled interrupts to improve the handling of continuous media I/O streams. This approach defines a buffer management system that allows efficient page re-mapping and shared memory mechanisms to be used between application processes, the kernel, and its devices.

Higher-level application services may not benefit from the efficiency of asynchronous I/O. Depending on the design of operating system or application framework interfaces, it may not be possible for higher-level services to use low-level asynchronous I/O devices effectively. The BSD UNIX operating system, for example, prevents applications from using certain types of hardware efficiently, even if external sources of I/O support asynchronous overlapping of computation and communication.

Complexity of debugging and testing. Applications written using the Half-Sync/Half-Async pattern can incur the same debugging and testing challenges described in Consequences sections of the Proactor (215) and Reactor (179) patterns.

See Also

The Proactor pattern (215) can be viewed as an extension of the Half-Sync/Half-Async pattern that propagates asynchronous control and data operations all the way up to higher-level services. In general, the Proactor pattern should be applied if an operating system platform supports asynchronous I/O efficiently and application developers are comfortable with the asynchronous I/O programming model.

The Reactor pattern (179) can be used in conjunction with the Active Object pattern (369) to implement the Half-Sync/Half-Reactive variant of the Half-Sync/Half-Async pattern. Similarly, the Leader/Followers (447) pattern can be used in lieu of the Half-Sync/Half-Async pattern if there is no need for a queueing layer between the asynchronous and synchronous layers.

The Pipes and Filters pattern [POSA1] describes several general principles for implementing producer-consumer communication between components in a software system. Certain configurations of the Half-Sync/Half-Async pattern can therefore be viewed as instances of the Pipes and Filters pattern, where filters contain entire layers of many finer-grained services. Moreover, a filter could contain active objects, which could yield the Half-Sync/Half-Reactive or Half-Sync/Half-Sync variants.

The Layers [POSA1] pattern describes the general principle of separating services into separate layers. The Half-Sync/Half-Async pattern can thus be seen as a specialization of the Layers pattern whose purpose is to separate synchronous processing from asynchronous processing in a concurrent system by introducing two designated layers for each type of service.

Credits

Chuck Cranor was the co-author of the original version of this pattern [PLoPD2]. We would also like to thank Lorrie Cranor and Paul McKenney for comments and suggestions for improving the pattern.

[6]An in-depth code walk-through showing how the Half-Sync/Half-Asyne pattern is applied in the BSD UNIX networking and file systems is described in [PLoPD2].

379

[7]Although this reactive layer is not truly asynchronous, it shares key properties with asynchronous services. In particular, event handlers dispatched by a reactor cannot block for long without starving other sources of events.

Leader/Followers The Leader/Followers architectural pattern provides an efficient concurrency model where multiple threads take turns sharing a set of event sources in order to detect, demultiplex, dispatch, and process service requests that occur on the event sources.

Example

Consider the design of a multi-tier, high-volume, on-line transaction processing (OLTP) system [GR93]. In this design, front-end communication servers route transaction requests from remote clients, such as travel agents, claims processing centers, or point-of-sales terminals, to back-end database servers that process the requests transactionally. After a transaction commits, the database server returns its results to the associated communication server, which then forwards the results back to the originating remote client. This multi-tier architecture is used to improve overall system throughput and reliability via load balancing and redundancy, respectively. It also relieves back-end servers from the burden of managing different communication protocols with remote clients.

One way to implement OLTP servers is to use a single-threaded event processing model based on the Reactor pattern (179). However, this model serializes event processing, which degrades the overall server performance when handling long-running or blocking client request events. Likewise, single-threaded servers cannot benefit transparently from multi-processor platforms.

A common strategy for improving OLTP server performance is to use a multi-threaded concurrency model that processes requests from different clients and corresponding results simultaneously [HPS99]. For example, we could multi-thread an OLTP back-end server by creating a thread pool based on the Half-Sync/Half-Reactive variant of the Half-Sync/Half-Async pattern (423). In this design, the OLTP back-end server contains a dedicated network I/O thread that uses the select() [Ste98] event demultiplexer to wait for events to occur on a set of socket handles connected to front-end communication servers.

380

When activity occurs on handles in the set, select() returns control to the network I/O thread and indicates which socket handles in the set have events pending. The I/O thread then reads the transaction requests from the socket handles, stores them into dynamically allocated requests, and inserts these requests into a synchronized message queue implemented using the Monitor Object pattern (399). This message queue is serviced by a pool of worker threads. When a worker thread in the pool is available, it removes a request from the queue, performs the designated transaction, and then returns a response to the front-end communication server.

Although the threading model described above is used in many concurrent applications, it can incur excessive overhead when used for high-volume servers, such as those in our OLTP example. For instance, even with a light workload, the Half-Sync/Half-Reactive thread pool design will incur a dynamic memory allocation, multiple synchronization operations, and a context switch to pass a request message between the network I/O thread and a worker thread. These overheads make even the best-case latency unnecessarily high [PRS+99]. Moreover, if the OLTP back-end server is run on a multi-processor, significant overhead can occur from processor cache coherency protocols required to transfer requests between threads [SKT96].

If the OLTP back-end servers run on an operating system platform that supports asynchronous I/O efficiently, the Half-Sync/Half-Reactive thread pool can be replaced with a purely asynchronous thread pool based on the Proactor pattern (215). This alternative will reduce much of the synchronization, context switching, and cache coherency overhead outlined above by eliminating the network I/O thread. Unfortunately, many operating systems do not support asynchronous I/O and those that do often support it inefficiently.[8] Yet, it is essential that high-volume OLTP servers demultiplex requests efficiently to threads that can process the results concurrently.

Context

An event-driven application where multiple service requests arriving on a set of event sources must be processed efficiently by multiple threads that share the event sources.

Problem

Multi-threading is a common technique to implement applications that process multiple events concurrently. However, it is hard to implement high-performance multi-threaded server applications. These applications often process a high volume of multiple types of

381

events, such as CONNECT, READ, and WRITE events in our OLTP example, that arrive simultaneously. To address this problem effectively, three forces must be resolved: § Service requests can arrive from multiple event sources, such as multiple TCP/IP

socket handles [Ste98], that are allocated for each connected client. A key design force, therefore, is determining efficient demultiplexing associations between threads and event sources. In particular, associating a thread for each event source may be infeasible due to the scalability limitations of applications or the underlying operating system and network platforms.

§ For our OLTP server applications, it may not be practical to associate a separate thread with each socket handle. In particular, as the number of connections increase significantly, this design may not scale efficiently on many operating system platforms.

§ To maximize performance, key sources of concurrency-related overhead, such as context switching, synchronization, and cache coherency management, must be minimized. In particular, concurrency models that allocate memory dynamically for each request passed between multiple threads will incur significant overhead on conventional multi-processor operating systems [SchSu95].

§ Implementing our OLTP servers using the Half-Sync/Half-Reactive thread pool variant (423) outlined in the Example section requires memory to be allocated dynamically in the network I/O thread to store incoming transaction requests into the message queue. This design incurs numerous synchronizations and context switches to insert the request into, or remove the request from, the message queue, as illustrated in the Monitor Object pattern (399).

§ Multiple threads that demultiplex events on a shared set of event sources must coordinate to prevent race conditions. Race conditions can occur if multiple threads try to access or modify certain types of event sources simultaneously.

§ For instance, a pool of threads cannot use select() concurrently to demultiplex a set of socket handles because the operating system will erroneously notify more than one thread calling select() when I/O events are pending on the same set of socket handles [Ste98]. Moreover, for bytestream-oriented protocols, such as TCP, having multiple threads invoking read() or write() on the same socket handle will corrupt or lose data.

Solution

Structure a pool of threads to share a set of event sources efficiently by taking turns demultiplexing events that arrive on these event sources and synchronously dispatching the events to application services that process them.

382

In detail: design a thread pool mechanism that allows multiple threads to coordinate themselves and protect critical sections while detecting, demultiplexing, dispatching, and processing events. In this mechanism, allow one thread at a time—the leader—to wait for an event to occur on a set of event sources. Meanwhile, other threads—the followers—can queue up waiting their turn to become the leader. After the current leader thread detects an event from the event source set, it first promotes a follower thread to become the new leader. It then plays the role of a processing thread, which demultiplexes and dispatches the event to a designated event handler that performs application-specific event handling in the processing thread. Multiple processing threads can handle events concurrently while the current leader thread waits for new events on the set of event sources shared by the threads. After handling its event, a processing thread reverts to a follower role and waits to become the leader thread again.

Structure

There are four key participants in the Leader/Followers pattern:

Handles are provided by operating systems to identify event sources, such as network connections or open files, that can generate and queue events. Events can originate from external sources, such as CONNECT events or READ events sent to a service from clients, or internal sources, such as time-outs. A handle set is a collection of handles that can be used to wait for one or more events to occur on handles in the set. A handle set returns to its caller when it is possible to initiate an operation on a handle in the set without the operation blocking.

OLTP servers are interested in two types of events—CONNECT events and READ events—which represent incoming connections and transaction requests, respectively. Both front-end and back-end servers maintain a separate connection for each client, where clients of front-end servers are the so-called 'remote' clients and front-end servers themselves are clients of back-end servers. Each connection is a source of events that is represented in a server by a separate socket handle. Our OLTP servers use the select() event demultiplexer, which identifies handles whose event sources have pending events, so that applications can invoke I/O operations on these handles without blocking the calling threads.

383

An event handler specifies an interface consisting of one or more hook methods [Pree95] [GoF95]. These methods represent the set of operations available to process application-specific events that occur on handle(s) serviced by an event handler.

Concrete event handlers specialize the event handler and implement a specific service that the application offers. In particular, concrete event handlers implement the hook method(s) responsible for processing events received from a handle.

For example, concrete event handlers in OLTP front-end communication servers receive and validate remote client requests, and then forward requests to back-end database servers. Likewise, concrete event handlers in back-end database servers receive transaction requests from front-end servers, read/write the appropriate database records to perform the transactions, and return the results to the front-end servers. All network I/O operations are performed via socket handles, which identify various sources of events.

At the heart of the Leader/Followers pattern is a thread pool, which is a group of threads that share a synchronizer, such as a semaphore or condition variable, and implement a protocol for coordinating their transition between various roles. One or more threads play the follower

384

role and queue up on the thread pool synchronizer waiting to play the leader role. One of these threads is selected to be the leader, which waits for an event to occur on any handle in its handle set. When an event occurs, the current leader thread promotes a follower thread to become the new leader. The original leader then concurrently plays the role of a processing thread, which demultiplexes that event from the handle set to an appropriate event handler and dispatches the handler's hook method to handle the event. After a processing thread is finished handling an event, it returns to playing the role of a follower thread and waits on the thread pool synchronizer for its turn to become the leader thread again.

Each OLTP server designed using the Leader/Followers pattern can have a pool of threads waiting to process transaction requests that arrive on event sources identified by a handle set. At any point in time, multiple threads in the pool can be processing transaction requests and sending results back to their clients. One thread in the pool is the current leader, which waits for a new CONNECT or READ event to arrive on the handle set shared by the threads. When this occurs, the leader thread becomes a processing thread and handles the event, while one of the follower threads in the pool is promoted to become the new leader.

The following class diagram illustrates the structure of participants in the Leader/Followers pattern. In this structure, multiple threads share the same instances of thread pool, event handler, and handle set participants. The thread pool ensures the correct and efficient coordination of the threads:

385

Dynamics

The collaborations in the Leader/Followers pattern divide into four phases: § Leader thread demultiplexing. The leader thread waits for an event to occur on any

handle in the handle set. If there is no current leader thread, for example, due to events arriving faster than the available threads can service them, the underlying operating system can queue events internally until a leader thread is available.

§ Follower thread promotion. After the leader thread has detected a new event, it uses the thread pool to choose a follower thread to become the new leader.

§ Event handler demultiplexing and event processing. After helping to promote a follower thread to become the new leader, the former leader thread then plays the role of a processing thread. This thread concurrently demultiplexes the event it detected to the event's associated handler and then dispatches the handler's hook method to process the event. A processing thread can execute concurrently with the leader thread and any other threads that are in the processing state.

§ Rejoining the thread pool. After the processing thread has run its event handling to completion, it can rejoin the thread pool and wait to process another event. A processing thread can become the leader immediately if there is no current leader thread. Otherwise, the processing thread returns to playing the role of a follower thread and waits on the thread pool synchronizer until it is promoted by a leader thread.

A thread's transitions between states can be visualized in the following diagram:

386

Implementation

Six activities can be used to implement the Leader/Followers pattern: 1. Choose the handle and handle set mechanisms. A handle set is a collection of handles

that a leader thread can use to wait for an event to occur on a set of event sources. Developers often choose the handles and handle set mechanisms provided by the underlying operating system, rather than implementing them from scratch. Four sub-activities help with choosing the handle and handle set mechanisms:

1. Determine the type of handles. There are two general types of handles: § Concurrent handles. This type allows multiple threads to access a

handle to an event source concurrently without incurring race conditions that can corrupt, lose, or scramble the data [Ste98]. For instance, the Socket API for record-oriented protocols, such as UDP, allows multiple threads to invoke read() or write() operations on the same handle concurrently.

§ Iterative handles. This type requires multiple threads to access a handle to an event source iteratively because concurrent access will incur race conditions. For instance, the Socket API for bytestream-oriented protocols, such as TCP, does not guarantee that read() or write() operations respect application-level message boundaries. Thus, corrupted or lost data can result if I/O operations on the Socket are not serialized properly.

2. Determine the type of handle set. There are two general types of handle sets: § Concurrent handle set. This type can be acted upon concurrently, for

example, by a pool of threads. Each time it becomes possible to initiate an operation on a handle in the set without blocking the operation, a concurrent handle set returns that handle to one of its calling threads. For example, the Win32 WaitForMultipleObjects() function [Sol98] supports concurrent handle sets by allowing a pool of threads to wait on the same set of handles simultaneously.

§ Iterative handle set. This type returns to its caller when it is possible to initiate an operation on one or more handles in the set without the operation(s) blocking. Although an iterative handle set can return multiple handles in a single call, it can only be called by one thread at a time. For example, the select() [Ste98] and poll() [Rago93] functions support iterative handle sets. Thus, a pool of threads cannot use select() or poll() to demultiplex events on the same handle set concurrently because multiple threads can be notified that the same I/O events are pending, which elicits erroneous behavior.

The following table summarizes representative examples for each combination of concurrent and iterative handles and handle sets:

Handle Sets

Handles Concurrent Handles Iterative Handles

387

Handle Sets

Handles Concurrent Handles Iterative Handles

Concurrent Handle Sets

UDP Sockets +

WaitForMultipleObjects()

TCP Sockets +

WaitForMultipleObjects()

Iterative Handle Sets

UDP Sockets +

select().poll()

TCP Sockets + select().poll()

3. Determine the consequences of selecting certain handle and handle set mechanisms. In general, the Leader/Followers pattern is used to prevent multiple threads from corrupting or losing data erroneously, such as invoking read operations on a shared TCP bytestream socket handle concurrently or invoking select() on a shared handle set concurrently. However, some applications need not guard against these problems. In particular, if the handle and handle set mechanisms are both concurrent, many of the subsequent implementation activities can be skipped.

As discussed in implementation activities 1.1 (456) and 1.2 (457), the semantics of certain combinations of protocols and network programming APIs support concurrent multiple I/O operations on a shared handle. For example, UDP support in the Socket API ensures a complete message is always read or written by one thread or another, without the risk of a partial read() or of data corruption from an interleaved write(). Likewise, certain handle set mechanisms, such as the Win32 WaitForMultipleObjects() function [Sol98], return a single handle per call, which allows them to be called concurrently by a pool of threads.[9]

In these situations, it may be possible to implement the Leader/Followers pattern by simply using the operating system's thread scheduler to (de)multiplex threads, handle sets, and handles robustly, in which case, implementation activities 2 through 6 can be skipped.

4. Implement an event handler demultiplexing mechanism. In addition to calling an event demultiplexer to wait for one or more events to occur on its handle set, such as select(), a Leader/Followers pattern implementation must demultiplex events to event handlers and dispatch their hook methods to process the events. In general, two alternative strategies can be used to implement this mechanism: § Program to a low-level operating system event demultiplexing

mechanism. In this strategy, the handle set demultiplexing mechanisms provided by the operating system are used directly. Thus, a Leader/Followers implementation must maintain a demultiplexing table that is a manager [Som97] containing a set of <handle, event handler, event types> tuples. Each handle serves as a 'key' that associates handles with event handlers in its demultiplexing table, which also stores the type of event(s), such as CONNECT and READ, that each event handler will process. The contents of this table are converted into handle sets passed to the native event demultiplexing mechanism, such as select() [Ste98] or WaitForMultipleObjects() [Sol98].

388

§ Implementation activity 3.3 of the Reactor pattern (179) illustrates how to implement a demultiplexing table.

§ Program to a higher-level event demultiplexing pattern. In this strategy, developers leverage higher-level patterns, such as Reactor (179), Proactor (215), and Wrapper Facade (47). These patterns help to simplify the Leader/Followers implementation and reduce the effort needed to address the accidental complexities of programming to native operating system handle set demultiplexing mechanisms directly. Moreover, applying higher-level patterns makes it easier to decouple the I/O and demultiplexing aspects of a system from its concurrency model, thereby reducing code duplication and maintenance effort.

§ In our OLTP server example, an event must be demultiplexed to the concrete event handler associated with the socket handle that received the event. The Reactor pattern (179) supports this activity, therefore it can be applied to simplify the implementation of the Leader/Followers pattern. In the context of the Leader/Followers pattern, however, a reactor demultiplexes just one handle at a time to its associated concrete event handler, regardless of how many handles have events pending on them. Demultiplexing only one handle at a time can maximize the concurrency among a pool of threads and simplify a Leader/Followers pattern implementation by alleviating its need to manage a separate queue of pending events.

2. Implement a protocol for temporarily (de)activating handles in a handle set. When an event arrives, the leader thread performs three steps: § It deactivates the handle from consideration in the handle set temporarily § It promotes a follower thread to become the new leader and § It continues to process the event.

Deactivating the handle from the handle set avoids race conditions that could occur between the time when a new leader is selected and the event is processed. If the new leader waits on the same handle in the handle set during this interval, it could demultiplex the event a second time, which is erroneous because the dispatch is already in progress. After the event is processed, the handle is reactivated in the handle set, which allows the leader thread to wait for an event to occur on it or any other activated handles in the set.

In our OLTP example, a handle deactivation and reactivation protocol can be provided by extending the Reactor interface defined in implementation activity 2 of the Reactor pattern (179): class Reactor { public: // Temporarily deactivate the <HANDLE> // from the internal handle set.

389

void deactivate_handle (HANDLE, Event_Type); // Reactivate a previously deactivated // <Event_Handler> to the internal handle set. void reactivate_handle (HANDLE, Event_Type); // ... };

3. Implement the thread pool. To promote a follower thread to the leader role, as well as to determine which thread is the current leader, an implementation of the Leader/Followers pattern must manage a pool of threads. A straightforward way to implement this is to have all the follower threads in the set simply wait on a single synchronizer, such as a semaphore or condition variable. In this design, it does not matter which thread processes an event, as long as all threads in the pool that share the handle set are serialized.

4. For example, the LF_Thread_Pool class shown below can be used for the back-end database servers in our OLTP example:

5. class LF_Thread_Pool { 6. public: 7. // Constructor. 8. LF_Thread_Pool (Reactor *r): reactor_ (r) { } 9. 10. // Threads call <join> to wait on a handle set

and 11. // demultiplex events to their event handlers. 12. void join (Time_Value *timeout = 0); 13. 14. // Promote a follower thread to become the 15. // leader thread. 16. void promote_new_leader (); 17. // Support the <HANDLE> (de)activation

protocol. 18. void deactivate_handle (HANDLE, Event_Type

et); 19. void reactivate_handle (HANDLE, Event_Type

et); 20. private: 21. // Pointer to the event

demultiplexer/dispatcher. 22. Reactor *reactor_; 23.

390

24. // The thread id of the leader thread, which is

25. // set to NO_CURRENT_LEADER if there is no leader.

26. Thread_Id leader_thread_; 27. 28. // Follower threads wait on this condition 29. // variable until they are promoted to leader. 30. Thread_Condition followers_condition_; 31. 32. // Serialize access to our internal state. 33. Thread_Mutex mutex_; 34. };

35. The constructor of LF_Thread_Pool caches the reactor passed to it. By default, this reactor implementation uses select(), which supports iterative handle sets. Therefore, LF_Thread_Pool is responsible for serializing multiple threads that take turns calling select() on the reactor's handle set.

36. Application threads invoke join() to wait on a handle set and demultiplex new events to their associated event handlers. As shown in implementation activity 4 (462), this method does not return to its caller until the application terminates or join() times out. The promote_new_leader() method promotes one of the follower threads in the set to become the new leader, as shown in implementation activity 5.2 (464).

37. The deactivate_handle() method and the reactivate_handle() method deactivate and reactivate handles within a reactor's handle set. The implementations of these methods simply forward to the same methods defined in the Reactor interface shown in implementation activity 2 (459).

38. Note that a single condition variable synchronizer followers_condition_ is shared by all threads in this thread pool. As shown in implementation activities 4 (462) and 5 (463), the implementation of LF_Thread_Pool uses the Monitor Object pattern (399).

40. Implement a protocol to allow threads to initially join (and later rejoin) the thread pool.

This protocol is used in the following two cases: § After the initial creation of a pool of threads that retrieve and process events;

and § After a processing thread completes and is available to handle another event.

If no leader thread is available, a processing thread can become the leader immediately. If a leader thread is already available, a thread can become a follower by waiting on the thread pool's synchronizer.

391

Our back-end database servers can implement the following join() method of the LF_Thread_Pool to wait on a handle set and demultiplex new events to their associated event handlers: void LF_Thread_Pool::join (Time_Value *timeout) { // Use Scoped Locking idiom to acquire mutex // automatically in the constructor. Guard<Thread_Mutex> guard (mutex_); for (;;) { while (leader_thread_ != NO_CURRENT_LEADER) // Sleep and release <mutex> atomically. followers_condition_.wait (timeout); // Assume the leader role. leader_thread_ = Thread::self (); // Leave monitor temporarily to allow other // follower threads to join the pool. guard.release (); // After becoming the leader, the thread uses // the reactor to wait for an event. reactor_->handle_events ()' // Reenter monitor to serialize the test // for <leader_thread_> in the while loop. guard.acquire (); } }

Within the for loop, the calling thread alternates between its role as a leader, processing, and follower thread. In the first part of this loop, the thread waits until it can be a leader, at which point it uses the reactor to wait for an event on the shared handle set. When the reactor detects an event on a handle, it will demultiplex the event to its associated event handler and dispatch its handle_event() method to promote a new leader and process the event. After the reactor demultiplexes one event, the thread re-assumes its follower role. These steps continue looping until the application terminates or a timeout occurs.

41. Implement the follower promotion protocol. Immediately after a leader thread detects an event, but before it demultiplexes the event to its event handler and processes the

392

event, it must promote a follower thread to become the new leader. Two sub-activities can be used to implement this protocol:

0. Implement the handle set synchronization protocol. If the handle set is iterative and we blindly promote a new leader thread, it is possible that the new leader thread will attempt to handle the same event that was detected by the previous leader thread that is in the midst of processing the event. To avoid this race condition, we must remove the handle from consideration in the handle set before promoting a follower to new leader and dispatching the event to its concrete event handler. The handle must be reactivated in the handle set after the event has been dispatched and processed.

1. An application can implement concrete event handlers that subclass from the Event_Handler class defined in implementation activity 1.2 of the Reactor pattern (179). Likewise, the Leader/Followers implementation can use the Decorator pattern [GoF95] to create an LF_Event_Handler class that decorates Event_Handler. This decorator promotes a new leader thread and activates/deactivates the handler in the reactor's handle set transparently to the concrete event handlers.

2. class LF_Event_Handler : public Event_Handler { 3. public: 4. LF_Event_Handler (Event_Handler *eh, 5. LF_Thread_Pool *tp) 6. : concrete_event_handler_ (eh), 7. thread_pool_ (tp) { } 8. 9. virtual void handle_event (HANDLE h,

Event_Type et) { 10. // Temporarily deactivate the handler in

the 11. // reactor to prevent race conditions. 12. thread_pool_->deactivate_handle (h, et); 13. // Promote a follower thread to become

leader. 14. thread_pool_->promote_new_leader (); 15. 16. // Dispatch application-specific event 17. // processing code. 18. concrete_event_handler_->handle_event (h,

et); 19. 20. // Reactivate the handle in the reactor. 21. thread_pool_->reactivate_handle (h, et); 22. } 23. private: 24. // This use of <Event_Handler> plays the

393

25. // <ConcreteComponent> role in the Decorator 26. // pattern, which is used to implement 27. // the application-specific functionality. 28. Event_Handler *concrete_event_handler_; 29. 30. // Instance of an <LF_Thread_Pool>. 31. LF_Thread_Pool *thread_pool_; 32. };

34. Determine the promotion protocol ordering. Several ordering strategies can be used to determine which follower thread to promote: § LIFO order. In many applications, it does not matter which of the

follower threads is promoted next because all threads are equivalent peers. In this case, the leader thread can promote follower threads in last-in, first-out (LIFO) order. The LIFO protocol maximizes CPU cache affinity [SKT96] [MB91] by ensuring that the thread waiting the shortest time is promoted first [Sol98], which is an example of the Fresh Work Before Stale pattern [Mes96].

Cache affinity can improve system performance if the thread that blocked most recently executes essentially the same code and data when it is scheduled to run again. Implementing a LIFO promotion protocol requires an additional data structure, however, such as a stack of waiting threads, rather than just using a native operating system synchronization object, such as a semaphore.

§ Priority order. In some applications, particularly real-time applications, threads may run at different priorities. In this case, therefore, it may be necessary to promote follower threads according to their priority. This protocol can be implemented using some type of priority queue, such as a heap [BaLee98]. Although this protocol is more complex than the LIFO protocol, it may be necessary to promote follower threads according to their priorities in order to minimize priority inversion [SMFG00].

§ Implementation-defined order. This ordering is most common when implementing handle sets using operating system synchronizers, such as semaphores or condition variables, which often dispatch waiting threads in an implementation-defined order. The advantage of this protocol is that it maps onto native operating system synchronizers efficiently.

§ Our OLTP back-end database servers could use the following simple protocol to promote follower thread in whatever order they are queued by a native operating system condition variable:

§ void LF_Thread_Pool::promote_new_leader () { § // Use Scoped Locking idiom to acquire mutex § // automatically in the constructor. § Guard<Thread_Mutex> guard (mutex_); §

394

§ if (leader_thread_ != Thread::self ()) § throw /* ...only leader thread can

promote... */; § § // Indicate that we are no longer the leader § // and notify a <join> method to promote § // the next follower. § leader_thread_ = NO_CURRENT_LEADER; § followers_condition_.notify (); § § // Release mutex automatically in destructor. § }

§ As shown in implementation activity 5.1 (463), the promote_new_leader() method is invoked by a LF_Event_Handler decorator before it forwards to the concrete event handler that processes an event.

42. Implement the event handlers. Application developers must decide what actions to perform when the hook method of a concrete event handler is invoked by a processing thread in the Leader/Followers pattern implementation. Implementation activity 5 in the Reactor pattern (179) describes various issues associated with implementing concrete event handlers.

Example Resolved

The OLTP back-end database servers described in the Example section can use the Leader/Followers pattern to implement a thread pool that demultiplexes I/O events from socket handles to their event handlers efficiently. In this design, there is no designated network I/O thread. Instead, a pool of threads is pre-allocated during database server initialization: const int MAX_THREADS = /* ... */; // Forward declaration. void *worker_thread (void *); int main () { LF_Thread_Pool thread_pool (Reactor::instance ()); // Code to set up a passive-mode Acceptor omitted. for (int i = 0; i < MAX_THREADS - 1; ++i) Thread_Manager::instance ()->spawn (worker_thread, &thread_pool);

395

// The main thread participates in the thread pool. thread_pool.join (); };

These threads are not bound to any particular socket handle. Thus, all threads in this pool take turns playing the role of a network I/O thread by invoking the LF Thread_Pool::join() method: void *worker_thread (void *arg) { LF_Thread_Pool *thread_pool = static_cast <LF_Thread_Pool *> (arg); // Each worker thread participates in the thread pool. thread_pool->join (); };

As shown in implementation activity 4 (462), the join() method allows only the leader thread to use the Reactor singleton to select() on a shared handle set of Sockets connected to OLTP front-end communication servers. If requests arrive when all threads are busy, they will be queued in socket handles until threads in the pool are available to execute the requests.

When a request event arrives, the leader thread deactivates the socket handle temporarily from consideration in select()'s handle set, promotes a follower thread to become the new leader, and continues to handle the request event as a processing thread. This processing thread then reads the request into a buffer that resides in the runtime stack or is allocated using the Thread-Specific Storage pattern (475).[10] All OLTP activities occur in the processing thread. Thus, no further context switching, synchronization, or data movement is necessary until the processing completes. When it finishes handling a request, the processing thread returns to playing the role of a follower and waits on the synchronizer in the thread pool. Moreover, the socket handle it was processing is reactivated in the Reactor singleton's handle set so that select() can wait for I/O events to occur on it, along with other Sockets in the handle set.

Variants

Bound Handle/Thread Associations. The earlier sections in this pattern describe unbound handle/thread associations, where there is no fixed association between threads and handles. Thus, any thread can process any event that occurs on any handle in a handle set. Unbound associations are often used when a pool of worker threads take turns demultiplexing a shared handle set.

A variant of the Leader/Followers pattern uses bound handle/thread associations. In this variant, each thread is bound to its own handle, which it uses to process particular events. Bound associations are often used in the client-side of an application when a thread waits on a socket handle for a response to a two-way request it sent to a server. In this case, the client application thread expects to process the response event on this handle in the same thread that sent the original request.

In the bound handle/thread association variant, therefore, the leader thread in the thread pool may need to hand-off an event to a follower thread if the leader does not have the necessary context to process the event. After the leader detects a new event, it checks the handle associated with the event to determine which thread is responsible for processing it.

396

If the leader thread discovers that it is responsible for the event, it promotes a follower thread to become the new leader Conversely, if the event is intended for another thread, the leader must hand-off the event to the designated follower thread. This follower thread can then temporally disable the handle and process the event. Meanwhile, the current leader thread continues to wait for another event to occur on the handle set.

The following diagram illustrates the additional transition between the following state and the processing state:

The leader/follower thread pool can be maintained implicitly, for example, using a synchronizer, such as a semaphore or condition variable, or explicitly, using a container and the Manager pattern [Som97]. The choice depends largely on whether the leader thread must notify a specific follower thread explicitly to perform event hand-offs.

A detailed discussion of the bounded handle/thread association variant and its implementation appears in [SRPKB00].

Relaxing Serialization Constraints. There are operating systems where multiple leader threads can wait on a handle set simultaneously. For example, the Win32 function WaitForMultipleObjects() [Sol98] supports concurrent handle sets that allow a pool of threads to wait on the same set of handles concurrently. Thus, a thread pool designed using this function can take advantage of multi-processor hardware to handle multiple events concurrently while other threads wait for events.

Two variations of the Leader/Followers pattern can be applied to allow multiple leader threads to be active simultaneously: § Leader/followers per multiple handle sets. This variation applies the conventional

Leader/Followers implementation to multiple handle sets separately. For instance, each thread is assigned a designated handle set. This variation is particularly useful in applications where multiple handle sets are available. However, this variant limits a thread to use a specific handle set.

§ Multiple leaders and multiple followers. In this variation, the pattern is extended to support multiple simultaneous leader threads, where any of the leader threads can wait on any handle set. When a thread re-joins the thread pool it checks if a leader is associated with every handle set already. If there is a handle set without a leader, the re-joining thread can become the leader of that handle set immediately.

Hybrid Thread Associations. Some applications use hybrid designs that implement both bound and unbound handle/thread associations simultaneously. Likewise, some handles in an application may have dedicated threads to handle certain events, whereas other handles can be processed by any thread. Thus, one variant of the Leader/Follower pattern uses its event hand-off mechanism to notify certain subsets of threads, according to the handle on which event activity occurs.

397

For example, the OLTP front-end communication server may have multiple threads using the Leader/Followers pattern to wait for new request events from clients. Likewise, it may also have threads waiting for responses to requests they invoked on back-end servers. In fact, threads play both roles over their lifetime, starting as threads that dispatch new incoming requests, then issuing requests to the back-end servers to satisfy the client application requirements, and finally waiting for responses to arrive from the back-end server.

Hybrid Client/Servers. In complex systems, where peer applications play both client and server roles, it is important that the communication infrastructure processes incoming requests while waiting for one or more replies. Otherwise, the system can deadlock because one client dedicates all its threads to block waiting for responses.

In this variant, the binding of threads and handles changes dynamically. For example, a thread may be unbound initially, yet while processing an incoming request the application discovers it requires a service provided by another peer in the distributed system. In this case, the unbound thread dispatches a new request while executing application code, effectively binding itself to the handle used to send the request. Later, when the response arrives and the thread completes the original request, it becomes unbound again.

Alternative Event Sources and Sinks. Consider a system where events are obtained not only through handles but also from other sources, such as shared memory or message queues. For example, in UNIX there are no event demultiplexing functions that can wait for I/O events, semaphore events, and/or message queue events simultaneously. However, a thread can either block waiting for one type of event at the same time. Thus, the Leader/Followers pattern can be extended to wait for more than one type of events simultaneously: § A leader thread is assigned to each source of events—as opposed to a single leader

thread for the complete system. § After the event is received, but before processing the event, a leader thread can select

any follower thread to wait on this event source.

A drawback with this variant, however, is that the number of participating threads must always be greater than the number of event sources. Therefore, this approach may not scale well as the number of event sources grows.

Known Uses

ACE Thread Pool Reactor framework [Sch97]. The ACE framework provides an object-oriented framework implementation of the Leader/Followers pattern called the 'thread pool reactor' (ACE_TP_Reactor) that demultiplexes events to event handlers within a pool of threads. When using a thread pool reactor, an application pre-spawns a fixed number of threads. When these threads invoke the ACE_TP_Reactor's handle_events() method, one thread will become the leader and wait for an event. Threads are considered unbound by the ACE thread pool reactor framework. Thus, after the leader thread detects the event, it promotes an arbitrary thread to become the next leader and then demultiplexes the event to its associated event handler.

CORBA ORBs and Web servers. Many CORBA implementations, including Chorus COOL ORB [SMFG00] and TAO [SC99], use the Leader/Followers pattern for both their client-side connection model and the server-side concurrency model. In addition, The JAWS Web server [HPS99] uses the Leader/Followers thread pool model for operating system platforms

398

that do not allow multiple threads to simultaneously call accept() on a passive-mode socket handle.

Transaction monitors. Popular transaction monitors, such as Tuxedo, operate traditionally on a per-process basis, for example, transactions are always associated with a process. Contemporary OLTP systems demand high-performance and scalability, however, and performing transactions on a per-process basis may fail to meet these requirements. Therefore, next-generation transaction services, such as implementations of the CORBA Transaction Service [OMG97b], employ bound Leader/Followers associations between threads and transactions.

Taxi stands. The Leader/Followers pattern is used in everyday life to organize many airport taxi stands. In this use case, taxi cabs play the role of the 'threads,' with the first taxi cab in line being the leader and the remaining taxi cabs being the followers. Likewise, passengers arriving at the taxi stand constitute the events that must be demultiplexed to the cabs, typically in FIFO order. In general, if any taxi cab can service any passenger, this scenario is equivalent to the unbound handle/thread association described in the main Implementation section. However, if only certain cabs can service certain passengers, this scenario is equivalent to the bound handle/thread association described in the Variants section.

Consequences

The Leader/Followers pattern provides several benefits:

Performance enhancements. Compared with the Half-Sync/Half-Reactive thread pool approach described in the Example section, the Leader/Followers pattern can improve performance as follows: § It enhances CPU cache affinity and eliminates the need for dynamic memory allocation

and data buffer sharing between threads. For example, a processing thread can read the request into buffer space allocated on its run-time stack or by using the Thread-Specific Storage pattern (475) to allocate memory.

§ It minimizes locking overhead by not exchanging data between threads, thereby reducing thread synchronization. In bound handle/thread associations, the leader thread demultiplexes the event to its event handler based on the value of the handle. The request event is then read from the handle by the follower thread processing the event. In unbound associations, the leader thread itself reads the request event from the handle and processes it.

§ It can minimize priority inversion because no extra queueing is introduced in the server. When combined with real-time I/O subsystems [KSL99], the Leader/Followers thread pool model can reduce sources of non-determinism in server request processing significantly.

§ It does not require a context switch to handle each event, reducing the event dispatching latency. Note that promoting a follower thread to fulfill the leader role does require a context switch. If two events arrive simultaneously this increases the dispatching latency for the second event, but the performance is no worse than Half-Sync/Half-Reactive thread pool implementations.

Programming simplicity. The Leader/Follower pattern simplifies the programming of concurrency models where multiple threads can receive requests, process responses, and demultiplex connections using a shared handle set.

However, the Leader/Followers pattern has the following liabilities:

Implementation complexity. The advanced variants of the Leader/Followers pattern are harder to implement than Half-Sync/Half-Reactive thread pools. In particular, when used as a multi-threaded connection multiplexer, the Leader/Followers pattern must maintain a pool

399

of follower threads waiting to process requests. This set must be updated when a follower thread is promoted to a leader and when a thread rejoins the pool of follower threads. All these operations can happen concurrently, in an unpredictable order. Thus, the Leader/Follower pattern implementation must be efficient, while ensuring operation atomicity.

Lack of flexibility. Thread pool models based on the Half-Sync/Half-Reactive variant of the Half-Sync/Half-Async pattern (423) allow events in the queueing layer to be discarded or re-prioritized. Similarly, the system can maintain multiple separate queues serviced by threads at different priorities to reduce contention and priority inversion between events at different priorities. In the Leader/Followers model, however, it is harder to discard or reorder events because there is no explicit queue. One way to provide this functionality is to offer different levels of service by using multiple Leader/Followers groups in the application, each one serviced by threads at different priorities.

Network I/O bottlenecks. The Leader/Followers pattern, as described in the Implementation section, serializes processing by allowing only a single thread at a time to wait on the handle set. In some environments, this design could become a bottleneck because only one thread at a time can demultiplex I/O events. In practice, however, this may not be a problem because most of the I/O-intensive processing is performed by the operating system kernel. Thus, application-level I/O operations can be performed rapidly.

See Also

The Reactor pattern (179) often forms the core of Leader/Followers pattern implementations. However, the Reactor pattern can be used in lieu of the Leader/Followers pattern when each event only requires a short amount of time to process. In this case, the additional scheduling complexity of the Leader/Followers pattern is unnecessary.

The Proactor pattern (215) defines another model for demultiplexing asynchronous event completions concurrently. It can be used instead of the Leader/Followers pattern: § When an operating system supports asynchronous I/O efficiently and § When programmers are comfortable with the asynchronous inversion of control

associated with the Proactor pattern

The Half-Sync/Half-Async (423) and Active Object (369) patterns are two other alternatives to the Leader/Followers pattern. These patterns may be a more appropriate choice than the Leader/Followers pattern: § When there are additional synchronization or ordering constraints that must be

addressed by reordering requests in a queue before they can be processed by threads in the pool and/or

§ When event sources cannot be waited for by a single event demultiplexer efficiently

The Controlled Reactor pattern [DeFe99] includes a performance manager that controls the use of threads for event handlers according to a user's specification and may be an alternative when controlled performance is an important objective.

Credits

Michael Kircher, Carlos O'Ryan, and Irfan Pyarali are the co-authors of the original version of the Leader/Followers pattern. Thanks to Ed Fernandez for his comments that helped improve this version of the pattern.

[8]For instance, some operating systems support asynchronous I/O by spawning a thread for each asynchronous operation, thereby defeating the potential performance benefits of asynchrony.

400

[9]However, WaitForMultipleObjects() does not by itself address the problem of notifying a particular thread when an event is available, which is necessary to support the bound thread/handle association discussed in the Variants section.

[10]In contrast, the Half-Sync/Half-Reactive thread pool described in the Example section must allocate each request dynamically from a shared heap because the request is passed between threads.

Thread-Specific Storage The Thread-Specific Storage design pattern allows multiple threads to use one 'logically global' access point to retrieve an object that is local to a thread, without incurring locking overhead on each object access.

Also Known As

Thread-Local Storage

Example

Consider the design of a multi-threaded network logging server that remote client applications use to record information about their status centrally within a distributed system. Unlike the logging server shown in the Reactor pattern example (179), which demultiplexed all client connections iteratively within a single thread, this logging server uses a thread-per-connection [Sch97] concurrency model to process requests concurrently.

In the thread-per-connection model a separate thread is created for each client connection. Each thread reads logging records from its associated TCP Socket, processes these records and writes them to the appropriate output device, such as a log file or a printer.

Each logging server thread is also responsible for detecting and reporting any low-level network conditions or system errors that occur when performing I/O. Many operating systems, such as UNIX and Windows NT, report this low-level information to applications via a global access point, called errno. When an error or unusual condition occurs during system calls, such as read() or write(), the operating system sets errno to indicate what has happened and returns a specific status value, such as −1. Applications must test for these return values and then check errno to determine what type of error or unusual condition occurred.

401

Consider the following C code fragment that receives client logging records from a TCP socket handle set to non-blocking mode [Ste98]. // One global <errno> per-process. extern int errno; void *logger (HANDLE socket) { // Read logging records until connection is closed. for (;;) { char log_record[MAXREC]; if (recv (socket, log_record, MAXREC, 0) == -1) { // Check to see why <recv> failed. if (errno == EWOULDBLOCK) sleep (1); // Try getting data later. else // Display error result. cerr << "recv failed, errno=" << errno; } else // Normal case ... } }

If recv() returns −1 the logging server logger code checks errno to determine what happened and decide how to proceed.

Although implementing errno at global scope works reasonably well for single-threaded applications, it can incur subtle problems for multi-threaded applications. In particular, race conditions in preemptive multi-threaded systems can cause an errno value set in one thread to be interpreted erroneously in other threads. If multiple threads execute the logger() function simultaneously erroneous interactions may occur.

For example, assume that thread T1 invokes a non-blocking recv() call that returns −1 and sets errno to EWOULDBLOCK, which indicates that no data is currently queued on the Socket. Before T1 can check for this case, however, it is preempted and thread T2 starts running.

Assuming that T2 is then interrupted by an asynchronous signal, such as SIGALRM, it sets errno to EINTR. If T2 is preempted immediately because its time-slice is finished, T1 will falsely assume its recv() call was interrupted and perform the wrong action:

402

One apparent solution to this problem is to apply the Wrapper Facade pattern (47) to encapsulate errno with an object wrapper that contains a lock. The Scoped Locking idiom (325) can then be used to acquire the lock before setting or checking errno and to release it afterwards. Unfortunately, this design will not solve the race condition problem, because setting and checking the global errno value is not atomic. Instead, it involves the following two activities:

1. The recv() call sets errno. 2. The application checks errno to determine what action to take.

A more robust way to prevent race conditions is to improve the errno locking protocol. For example, the recv() system call could acquire a global errno_lock before it sets errno. Subsequently, when recv() returns, the application releases the errno_lock after it tests the value of errno. This solution is error-prone, however, because applications may forget to release errno_lock, causing starvation and deadlock. Also, because applications may need to check the status of errno frequently, the extra locking overhead will degrade performance significantly, particularly when an application happens to run in a single-threaded configuration.

What is needed therefore is mechanism that transparently gives each thread its own local copy of 'logically global' objects, such as errno.

Context

Multi-threaded applications that frequently access data or objects that are logically global but whose state should be physically local to each thread.

Problem

Multi-threaded applications can be hard to program due to the complex concurrency control protocols needed to avoid race conditions, starvation and deadlocks [Lea99a]. Due to locking overhead, multi-threaded applications also often perform no better than single-threaded applications. In fact, they may perform worse, particularly on multi-processor platforms [SchSu95]. Two forces can arise in concurrent programs: § Multi-threaded applications should be both easy to program and efficient. In particular,

access to data that is logically global but physically local to a thread should be atomic without incurring locking overhead for each access.[11]

§ As described in the Example section, operating systems often implement errno as a 'logically global' variable that developers program as if it were an actual global variable. To avoid race conditions, however, the memory used to store errno is allocated locally, once per thread.

403

§ Many legacy libraries and applications were written originally assuming a single thread of control. They therefore often pass data implicitly between methods via global objects, such as errno, rather than passing parameters explicitly. When retrofitting such code to run in multiple threads it is often not feasible to change existing interfaces and code in legacy applications.

§ Operating systems that return error status codes implicitly in errno cannot be changed easily to return these error codes explicitly without causing existing applications and library components to break.

Solution

Introduce a global access point for each thread-specific object, but maintain the 'real' object in storage that is local to each thread. Let applications manipulate these thread-specific objects only through their global access points.

Structure

The Thread-Specific Storage pattern is composed of six participants.

A thread-specific object is an instance of an object that can be accessed only by a particular thread.

For example, in operating systems that support multi-threaded processes, errno is an int that has a different instance in each thread.

404

A thread identifies a thread-specific object using a key that is allocated by a key factory. Keys generated by the key factory are assigned from a single range of values to ensure that each thread-specific object is 'logically' global.

For example, a multi-threaded operating system implements errno by creating a globally-unique key. Each thread uses this key to access its own local instance of errno implicitly.

A thread-specific object set contains the collection of thread-specific objects that are associated with a particular thread. Each thread has its own thread-specific object set. Internally, this thread-specific object set defines a pair of methods, which we call set() and get(), to map the globally-managed set of keys to the thread-specific objects stored in the set. Clients of a thread-specific object set can obtain a pointer to a particular thread-specific object by passing a key that identifies the object as a parameter to get(). The client can inspect or modify the object via the pointer returned by the get() method. Similarly, clients can add a pointer to a thread-specific object into the object set by passing the pointer to the object and its associated key as parameters to set().

An operating system's threads library typically implements the thread-specific object set. This set contains the errno data, among other thread-specific objects.

405

A thread-specific object proxy [GoF95] [POSA1] can be defined to enable clients to access a specific type of thread-specific object as if it were an ordinary object. If proxies are not used, clients must access thread-specific object sets directly and use keys explicitly, which is tedious and error-prone. Each proxy instance stores a key that identifies the thread-specific object uniquely. Thus, there is one thread-specific object per-key, per-thread.

A thread-specific object proxy exposes the same interface as its associated thread-specific object. Internally, the interface methods of the proxy first use the set() and get() methods provided by its thread-specific object set to obtain a pointer to the thread-specific object designated by the key stored in the proxy. After a pointer to the appropriate thread-specific object has been obtained, the proxy then delegates the original method call to it.

For example, errno is implemented as a preprocessor macro that plays the role of the proxy and shields applications from thread-specific operations.

Application threads are clients that use thread-specific object proxies to access particular thread-specific objects that reside in thread-specific storage. To an application thread, the method appears to be invoked on an ordinary object, when in fact it is invoked on a thread-specific object. Multiple application threads can use the same thread-specific object proxy to access their unique thread-specific objects. A proxy uses the identifier of the application thread that calls its interface methods to differentiate between the thread-specific objects it encapsulates.

For example, the thread that runs the logger function in the Example section is an application thread.

The following class diagram illustrates the general structure of the Thread-Specific Storage pattern:

406

The participants in the Thread-Specific Storage pattern can be modeled conceptually as a two-dimensional matrix that has one row per key and one column per thread. The matrix entry at row k and column t yields a pointer to the corresponding thread-specific object. Creating a key is analogous to adding a row to the matrix; creating a new thread is analogous to adding a column.

A thread-specific object proxy works in conjunction with the thread-specific object set to provide application threads with a type-safe mechanism to access a particular object located at row k and column t. The key factory maintains a count of how many keys have been used. A thread-specific object set contains the entries in one column.

Note that the model above is only an analogy. In practice, implementations of the Thread-Specific Storage pattern do not use two-dimensional matrices, because keys are not necessarily consecutive integers. The entries in the thread-specific object set may also reside in their corresponding thread, rather than in a global two-dimensional matrix. It is helpful to visualize the structure of the Thread-Specific Storage pattern as a two-dimensional matrix, however. We therefore refer to this metaphor in the following sections.

Dynamics

There are two general scenarios in the Thread-Specific Storage pattern: creating and accessing a thread-specific object. Scenario I describes the creation of a thread-specific object: § An application thread invokes a method defined in the interface of a thread-specific

object proxy. § If the proxy does not yet have an associated key it asks the key factory to create a new

key. This key identifies the associated thread-specific object uniquely in each thread's object set. The proxy then stores the key, to optimize subsequent method invocations by application threads.

§ The thread-specific object proxy creates a new object dynamically. It then uses the thread-specific object set's set() method to store a pointer to this object in the location designated by the key.

§ The method that was invoked by the application thread is then executed, as shown in Scenario II.

407

Scenario II describes how an application thread accesses an existing thread-specific object: § An application thread invokes a method on a thread-specific object proxy. § The thread-specific object proxy passes its stored key to the get() method of the

application thread's thread-specific object set. It then retrieves a pointer to the corresponding thread-specific object.

§ The proxy uses this pointer to delegate the original method call to the thread-specific object. Note that no locking is necessary, because the object is referenced through a pointer that is accessed only within the client application thread itself.

Implementation

Implementing the Thread-Specific Storage pattern centers on implementing thread-specific object sets and thread-specific object proxies. These two components create the mechanisms for managing and accessing objects residing in thread-specific storage. We therefore describe their implementation—including potential alternatives—as two separate activities, starting with thread-specific object sets and then covering thread-specific object proxies.

The thread-specific objects themselves, as well as the application code that accesses them, are defined by application developers. We therefore do not provide general implementation activities for these pattern participants. In the Example Resolved section, however, we use our multi-threaded logging server example to illustrate how applications can program the Thread-Specific Storage pattern effectively.

1. Implement the thread-specific object sets. This activity is divided into six sub-activities: 1. Determine the type of the thread-specific objects. In terms of our two-

dimensional matrix analogy, a thread-specific object is an entry in the matrix that has the following properties: § Its row number corresponds to the key that uniquely identifies the

'logically global' object. § Its column number corresponds to a particular application thread

identifier.

To make implementations of the Thread-Specific Storage pattern more generic, a pointer to a thread-specific object is stored rather than storing the object itself. These pointers are often 'loosely typed', such as C/C++ void *'s, so that they can point to any type of object. Although loosely typed void *'s are highly flexible, they are hard to program correctly. Implementation

408

activity 2 (491) therefore describes several strategies to encapsulate void *'s with less error-prone, strongly-typed proxy classes.

2. Determine where to store the thread-specific object sets. In terms of our two-dimensional matrix analogy, the thread-specific object sets correspond to matrix columns, which are allocated one per application thread. Each application thread identifier therefore designates one column in our conceptual two-dimensional matrix. Each thread-specific object set can be stored either externally to all threads or internally to its own thread:

There are pros and cons for each strategy: § External to all threads. This strategy maps each application thread's

identifier to a global table of thread-specific object sets that are stored externally to all threads. Note that an application thread can obtain its own thread identifier by calling an API in the threading library. Implementations of external thread-specific object sets can therefore readily determine which thread-specific object set is associated with a particular application thread.

Depending on the implementation of the external table strategy, threads can access thread-specific object sets in other threads. At first, this design may appear to defeat the whole point of the Thread-Specific Storage pattern, because the objects and pointers themselves do not reside in thread-specific storage. It may be useful, however, if the thread-specific storage implementation can recycle keys when they are no longer needed, for example if an application no longer needs to access a global object, such as errno, for some reason.

A global table facilitates access to all thread-specific object sets from one 'clean-up' thread, to remove the entries corresponding to the recycled key. Recycling keys is particularly useful for Thread-Specific Storage pattern implementations that support only a limited number of keys. For example, Windows NT has a limit of 64 keys per process. Real-time operating systems often support even fewer keys.

One drawback of storing thread-specific object sets in a global table external to all threads is the increased overhead for accessing each thread-specific object. This overhead stems from the synchronization mechanisms needed to avoid race conditions every time the global table containing all the thread-specific object sets is modified. In particular, serialization is necessary when the key factory creates a new key, because other application threads may be creating keys concurrently. After the appropriate thread-specific object set is identified, however, the application thread need not perform any more locking operations to access a thread-specific object in the set.

§ Internal to each thread. This strategy requires each thread to store a thread-specific object set with its other internal state, such as its run-time thread stack, program counter, general-purpose registers, and thread identifier. When a thread accesses a thread-specific object, the object is

409

retrieved by using its associated key as an index into the thread's internal thread-specific object set. Unlike the external strategy described above, no serialization is required when the thread-specific object set is stored internally to each thread. In this case all accesses to a thread's internal state occurs within the thread itself.

Storing the thread-specific object set locally in each thread requires more state per-thread, however, though not necessarily more total memory consumption. As long as the growth in size does not increase the cost of thread creation, context switching or destruction significantly, the internal thread-specific object strategy can be more efficient than the external strategy.

If an operating system provides an adequate thread-specific storage mechanism, thread-specific object sets can be implemented internally to each thread via the native operating system mechanism. If not, thread-specific object sets can be implemented externally using a two-level mapping strategy. In this strategy, one key in the native thread-specific storage mechanism is dedicated to point to a thread-specific object set implemented externally to each thread.

3. Define a data structure to map application thread identifiers to thread-specific object sets. In terms of the two-dimensional matrix analogy, application thread identifiers map to the columns in the matrix that represent thread-specific object sets.

Application thread identifiers can range in value from very small to very large. A large range in values presents no problem for object sets that reside internally to each thread. In this case the thread identifier is associated implicitly with the corresponding thread-specific object set contained within the thread's state. Thus, there is no need to implement a separate data structure to map application thread identifiers to thread-specific object sets.

For thread-specific object sets residing externally to all threads, however, it may be impractical to have a fixed-size array with an entry for every possible thread identifier value. This is one reason why the two-dimensional matrix analogy is just a conceptual model rather than a realistic implementation strategy. In this case it may be more space efficient to use a dynamic data structure that maps thread identifiers to thread-specific object sets.

For example, a common strategy is to compute a hash function using the thread identifier to obtain an offset to a hash table. The entry at this offset contains a chain of tuples that map thread identifiers to their corresponding thread-specific object sets.

4. Define the data structure that maps keys to thread-specific objects within a thread-specific object set. In terms of the two-dimensional matrix analogy, this mapping identifies a particular matrix entry (the thread-specific object) according to its row (the key) at a particular column (the thread-specific object set associated with a particular application thread identifier). For both the external and internal thread-specific object set implementations, we must select either a fixed-sized or a variable-sized data structure for this mapping.

The thread-specific object set can be stored in a fixed-size array if the range of thread-specific key values is relatively small and contiguous. The POSIX Pthreads standard [IEEE96], for example, defines a standard macro, _POSIX_THREAD_KEYS_MAX, that sets the maximum number of keys supported by a Pthreads implementation. If the size defined by this macro is

410

small and fixed, for example 64 keys, the lookup time can be O(1) by indexing into the thread-specific object set array directly using the key that identifies a thread-specific object.

Some thread-specific storage implementations provide a range of thread-specific keys that is large and nearly unbounded, however. Solaris threads, for example, have no predefined limit on the number of thread-specific storage keys in an application process. Solaris therefore uses a variable-sized data structure, such as a hash table to map keys to thread-specific objects. Although this data structure is more flexible than a fixed-size array, it can increase the overhead of managing the thread-specific object set when a many keys are allocated.

The following code shows how thread-specific object sets can be implemented internally within each thread using a fixed-sized array of thread-specific objects that are stored as void *'s. Using this internal design means that it is not necessary to map application thread identifiers to thread-specific object sets. Instead, we need only provide a data structure that maps keys to thread-specific objects within a thread-specific object set.

All the C code examples shown in this Implementation section are adapted from a publicly available user-level library implementation [Mue93] of POSIX Pthreads [IEEE96]. For example, the object_set_ data structure corresponding to implementation activity 1.4 (487) is contained within the following thread_state struct. This struct is used by the Pthreads library implementation to store the state of each thread: struct thread_state { // Thread-specific 'object' set implemented via // void *'s. void *object_set_[_POSIX_THREAD_KEYS_MAX]; // ... Other thread state. };

In addition to keeping track of the array of pointers to thread-specific storage objects, an instance of thread_state also includes other thread state. This includes a pointer to the thread's stack and space to store thread-specific registers that are saved and restored during a context switch. Our Pthreads implementation also defines several macros to simplify its internal programming: // Note that <errno>'s key number is 0, i.e., // it is in first slot of array object_set_. #define ERRNO_KEY 0 // Define a macro that's used internally to the Pthreads // implementation to set and get <errno> values. #define INTERNAL_ERRNO \

411

(pthread_self ()->object_set_[ERRNO_KEY])

The pthread_self() function used by the INTERNAL_ERRNO macro is an internal implementation subroutine that returns a pointer to the context of the currently active thread.

5. Define the key factory. In our two-dimensional matrix analogy, keys correspond to rows in the matrix. The key factory creates a new key that identifies a 'logically global' object (row) uniquely. The state of this object will physically reside in storage that is local to each thread.

For a particular object that is logically global yet physically local to each thread, the same key value k is used by all threads to access their corresponding thread-specific object. The count of the number of keys currently in use can therefore be stored globally to all threads.

The code below illustrates our Pthreads library implementation of the pthread_key_create() key factory. Keys are represented by integer values: typedef int pthread_key_t;

A static variable keeps track of the current key count within the thread-specific storage implementation: // All threads share the same key counter. static pthread_key_t total_keys_ = 0;

The total_keys_ variable is incremented automatically every time a new thread-specific key is required, which is equivalent to adding a new row to our conceptual two-dimensional matrix. Next, we define the key factory itself: int pthread_key_create (pthread_key_t *key, void (*thread_exit_hook)(void *)) { if (total_keys_ >= _POSIX_THREAD_KEYS_MAX) { // Use our internal <errno> macro. INTERNAL_ERRNO = ENOMEM; return -1; } thread_exit_hook_[total_keys_] = thread_exit_hook; *key = total_keys_++; return 0; }

The pthread_key_create() function is a key factory. It allocates a new key that identifies a thread-specific data object uniquely. This function

412

requires no internal synchronization, because it must be called with an external lock held, as shown by the TS_Proxy in implementation activity 2.1 (492).

When a key is created, the pthread_key_create() function allows the calling thread to associate a thread_exit_hook with the new key. This hook is a pointer to a function that will be used to delete any dynamically allocated thread-specific objects that are associated with the key. When a thread exits, the Pthreads library calls this function pointer automatically for each key that has registered an exit hook.

To implement this feature, an array of function pointers to 'thread exit hooks' can be stored as a static global variable in the Pthreads library: // Array of exit hook function pointers that can be used // to deallocate thread-specific data objects. static void (*thread_exit_hook_[_POSIX_THREAD_KEYS_MAX]) (void *);

The pthread_exit() function shows how exit hook functions are called back just before a thread exits: // Terminate the thread and call thread exit hooks. void pthread_exit (void *status) { // ... for (i = 0; i < total_keys_; ++i) if (pthread_self ()->object_set_[i] && thread_exit_hook_[i]) // Indirect pointer to function call. (*thread_exit_hook_[i]) (pthread_self ()->object_set_[i]); // Terminate the thread and clean up internal // resources... }

For each key, an application can either register the same function pointer, a different function pointer, or any combination of function pointers. When each thread exits, the Pthreads implementation calls the same function that was registered when each key was created. Applications often implement thread exit hooks as follows because deleting dynamically allocated thread-specific objects is common: void cleanup_tss_object (void *ptr) { // This cast is necessary to invoke the // appropriate destructor (if one exists). delete (Object_Foo *) ptr;

413

}

6. Define methods to store and retrieve thread-specific objects from a thread-specific object set. In terms of the matrix analogy, these two methods set and get the value of matrix entries. The set() method stores a void* at matrix entry [k,t], whereas the get() method retrieves a void* at matrix entry [k,t]. In thread-specific storage implementations, k is passed as a key argument and t is the implicit thread identifier returned by a call to pthread_self().

7. The pthread_setspecific() function is a set() method that stores a void* using the key passed by the client application thread that calls it:

8. int pthread_setspecific (pthread_key_t key, void *value) {

9. if (key < 0 || key >= total_keys) { 10. // Use our internal <errno> macro. 11. INTERNAL_ERRNO = EINVAL; 12. return -1; 13. } 14. // Store value into appropriate slot in the

thread- 15. // specific object set. 16. pthread_self ()->object_set_[key] = value; 17. return 0; 18. }

19. Similarly, the pthread_getspecific() function retrieves a void* using the key passed by the client application thread:

20. int pthread_getspecific (pthread_key_t key, void **value){

21. if (key < 0 || key >= total_keys) { 22. // Use our internal <errno> macro. 23. INTERNAL_ERRNO = EINVAL; 24. return -1; 25. } 26. *value = pthread_self ()->object_set_[key]; 27. return 0; 28. }

29. In this implementation, neither function requires any locks to access its thread-specific object set, because the set resides internally within the state of each thread.

414

2. Implement thread-specific object proxies. In theory, the thread-specific object sets written in C above are sufficient to implement the Thread-Specific Storage pattern. In practice, however, it is undesirable to rely on such low-level C function APIs for two reasons: § Although the thread-specific storage APIs of popular threading libraries, such

as POSIX Pthreads, Solaris threads, and Win32 threads, are similar, their semantics differ subtly. For example, Win32 threads, unlike POSIX Pthreads and Solaris threads, do not provide a reliable way to deallocate objects allocated in thread-specific storage when a thread exits. In Solaris threads, conversely, there is no API to delete a key. These diverse semantics make it hard to write code that runs portably on all three platforms.

§ The POSIX Pthreads, Solaris, and Win32 thread-specific storage APIs store pointers to thread-specific objects as void*'s. Although this approach provides maximal flexibility, it is error-prone because void*'s eliminate type-safety.

To overcome these limitations the Thread-Specific Storage pattern defines a thread-specific object proxy. Each proxy applies the Proxy pattern [GoF95] [POSA1] to define an object that acts as a 'surrogate' for a thread-specific object. Application threads that invoke methods on a proxy appear to access an ordinary object, when in fact the proxy forwards the methods to a thread-specific object. This design shields applications from knowing when or how thread-specific storage is being used. It also allows applications to use higher-level, type-safe, and platform-independent wrapper facades (47) to access thread-specific objects managed by lower-level C function APIs.

The implementation of thread-specific object proxies can be divided into three sub-activities:

3. Define the thread-specific object proxy interfaces. For the thread-specific object there are two strategies for designing a proxy interface, polymorphism or parameterized types: § Polymorphism. In this strategy an abstract proxy class declares and

implements the data structures and methods that every proxy supports. Examples include the key that the thread-specific object set associates with a particular 'logically global' object, or the lock needed to avoid race conditions when creating this key.

Access to the concrete methods offered by thread-specific objects is provided by subclasses of the general proxy, with one class for each type of thread-specific object. Before forwarding client application requests to the corresponding methods in the thread-specific object, a proxy first retrieves an object pointer from the thread-specific object set via the key stored in the proxy.

Using polymorphism to implement a proxy is a common strategy [POSA1]. It can incur overhead, however, due to the extra level of indirection caused by dynamic binding.

§ Parameterized types. In this strategy the proxy can be parameterized by the types of objects that will reside in thread-specific storage. As with the polymorphism strategy described above, the proxy mechanism only declares and implements the data structures and methods every proxy supports. It also performs all necessary operations on the thread-specific object set before invoking the designated method on the thread-specific object. Parameterization can remove the indirection associated with polymorphism, which can improve the proxy's performance.

415

A key design problem that arises when using the parameterized type strategy is selecting a convenient mechanism to access the methods of thread-specific objects encapsulated by a proxy. In particular, different types of thread-specific objects have different interfaces. The mechanism for accessing these objects cannot therefore define any concrete service methods. This differs from the polymorphism strategy described above.

One way to address this problem is to use smart pointers [Mey95], such as operator->, also known as the C++ arrow operator [Str97]. This operator allow client application threads to access the proxy as if they were accessing the thread-specific object directly. The operator-> method receives special treatment from the C++ compiler. It first obtains a pointer to the appropriate type of thread-specific object, then delegates the original method invoked on it.

Another generic way to access the methods of thread-specific objects is to apply the Extension Interface pattern (141). This solution introduces a generic method for the proxy that allows clients to retrieve the concrete interfaces supported by the configured thread-specific object.

4. In our example we use C++ parameterized types to define a type-safe template that applications can use to instantiate thread-specific object proxies with concrete thread-specific objects:

5. template <class TYPE> 6. class TS_Proxy { 7. public: 8. // Constructor and destructor. 9. TS_Proxy (); 10. ~TS_Proxy (); 11. 12. // Define the C++ '->' and '*' operators to

access 13. // the thread-specific <TYPE> object. 14. TYPE *operator-> () const; 15. TYPE &operator* () const; 16. private: 17. // Key that uniquely identifies the 'logically 18. // global' object that 'physically' resides

locally 19. // in thread-specific storage. 20. mutable pthread_key_t key_; 21. 22. // "First time in" flag 23. mutable bool once_; 24.

416

25. // Avoid race conditions during initialization.

26. mutable Thread_Mutex keylock_; 27. 28. // A static cleanup hook method that deletes 29. // dynamically allocated memory. 30. static void cleanup_hook (void *ptr); 31. };

32. This thread-specific proxy template is parameterized by the type of object that will be accessed via thread-specific storage. In addition, it defines the C++ smart pointer operator-> to access a thread-specific object of type TYPE.

34. Implement the creation and destruction of the thread-specific object proxy. Regardless of whether we apply the polymorphism or parameterized type strategy to define the thread-specific object proxy, we must manage the creation and destruction of thread-specific object proxies.

35. The constructor for our thread-specific object proxy template class is minimal, it simply initializes the object's data members:

36. template <class TYPE> 37. TS_Proxy<TYPE>::TS_Proxy (): once_ (false), key_

(0) { }

39. In general, a proxy's constructor does not allocate the key or a new thread-specific object instance in the constructor for two reasons: § Thread-specific storage semantics. A thread-specific object proxy is

often created by a thread, for example the application's main thread, that is different from thread(s) that use the proxy. Thus, there is no benefit from pre-initializing a new thread-specific object in its constructor because this instance will only be accessible by the thread that created it.

§ Deferred creation. On some operating systems, keys are limited resources and should not be allocated until absolutely necessary. Their creation should therefore be deferred until the first time a method of the proxy is invoked. In our example implementation this point of time occurs in the operator-> method.

The destructor for the thread-specific object proxy presents us with several tricky design issues. The 'obvious' solution is to release the key that was allocated by the key factory. There are several problems with this approach, however: § Non-portability. It is hard to write a proxy destructor that releases keys

portably. For example, Solaris threads, unlike Win32 and POSIX Pthreads, lacks an API to release keys that are not needed.

417

§ Race conditions. One reason that Solaris threads do not provide an API to release keys is that it hard to implement efficiently and correctly. The problem is that each thread maintains independent copies of the objects referenced by a key. Only after all threads have exited and the memory reclaimed can a key be released safely.

As a result of these problems the proxy's destructor is generally a 'noop'. This means that we do not recycle keys in this implementation. In lieu of a destructor, therefore, we implement a thread-exit hook function, as discussed in implementation activity 1.5 (489). This hook is dispatched automatically by the thread-specific storage implementation when a thread exits. It deletes the thread-specific object, thereby ensuring that the destructor of the object is invoked.

The destructor of our TS_Proxy class is a 'no-op': template <class TYPE> TS_Proxy<TYPE>::~TS_Proxy () { }

To ensure the right destructor is called, the thread-exit hook casts its ptr argument to a pointer to the appropriate TYPE before deleting it: template <class TYPE> void TS_Proxy<TYPE>::cleanup_hook (void *ptr) { // This cast invokes the destructor (if one exists). delete (TYPE *) ptr; }

Note that the cleanup_hook() is defined as a static method in the TS_Proxy class. By defining this method as static, it can be passed as a pointer-to-function thread exit hook to pthread_key_create().

40. Implement the access to the thread-specific object. As we discussed earlier, there are two general strategies—polymorphism and parameterized types—for accessing the methods of a thread-specific object that is represented by a proxy.

When using the polymorphism strategy, the interface of each concrete proxy must include all methods offered by the thread-specific object that is represented by this class. Method implementations in a concrete proxy generally perform four steps: § Create a new key, if no such thread-specific object has been created

yet. We must avoid race conditions by preventing multiple threads from creating a new key for the same TYPE of thread-specific object simultaneously. We can resolve this problem by applying the Double-Checked Locking Optimization pattern (365).

§ The method must next use the key stored by the proxy to get the thread-specific object via its thread-specific object set.

§ If the object does not yet exist, it is created 'on-demand'.

418

§ The requested operation is forwarded to the thread-specific object. Any operation results are returned to the client application thread.

To avoid repeating this code within each proxy method, we recommend introducing a helper method in the thread-specific object proxy base class that implements these general steps.

When using parameterized types to instantiate a generic proxy, the smart pointer and Extension Interface pattern (365) strategies described in implementation activity 2.1 (492) can be applied to implement a general access mechanism for any thread-specific object's methods. Analogously to the polymorphism strategy, the general access mechanism must follow the implementation steps described above.

By using the parameterized type strategy and overloading the C++ arrow operator, operator->, applications can invoke methods on instances of TS_Proxy as if they were invoking a method on an instance of the TYPE parameter. The C++ arrow operator controls all access to the thread-specific object of class TYPE. It performs most of the work, as follows: template <class TYPE> TYPE *TS_Proxy<TYPE>::operator->() const { TYPE *tss_data = 0; // Use the Double-Checked Locking Optimization // pattern to avoid excessive locking. if (!once_) { // Use Scoped Locking idiom to ensure <keylock_> // is acquired to serialize critical section. Guard <Thread_Mutex> guard (keylock_); if (!once_) { pthread_key_create (&key_, &cleanup_hook); // Must come last so that other threads // don't use the key until it's created. once_ = true; } // <Guard> destructor releases the lock. } // Get data from thread-specific storage. Note that no // locks are required, because this thread's own copy

419

// of the thread-specific object will be accessed. pthread_getspecific (key_, (void **) &tss_data); // Check if it's the first time in for this thread. if (tss_data == 0) { // Allocate memory dynamically off the heap, tss_data = new TYPE; // Store pointer in thread-specific storage. pthread_setspecific (key_, (void *) tss_data); } return tss_data; }

The TS_Proxy template is a proxy that transforms ordinary C++ classes into type-safe classes whose instances reside in thread-specific storage. It combines the operator-> method with C++ features, such as templates, inlining, and overloading. In addition, it uses common concurrency control patterns and idioms, such as Double-Checked Locking Optimization (353), Scoped Locking (325), and Strategized Locking (333).

The Double-Checked Locking Optimization pattern is used in operator-> to test the once_ flag twice. Although multiple threads could access the same instance of TS_Proxy simultaneously, only one thread can validly create a key via the pthread_key_create() method. All threads then use this key to access their associated thread-specific object of the parameterized class TYPE. The operator-> method therefore uses a keylock_ of type Thread_Mutex to ensure that only one thread at a time executes pthread_key_create().

After key_ is created, no other locking is needed to access thread-specific objects, because the pthread_getspecific() and pthread_setspecific() functions both retrieve the thread-specific object of class TYPE from the state of the client application thread, which is independent from other threads. In addition to reducing locking overhead, the implementation of class TS_Proxy shown above shields application code from the fact that objects are local to the calling thread.

The implementation of the extension interface and polymorphic proxy are similar to the generic smart pointer approach shown above. The polymorphic proxy approach simply forwards to a method of the thread-specific object and returns the result. Similarly, the extension interface approach returns an

420

extension interface from the thread-specific object and passes this back to the client.

Example Resolved

The following application is similar to our original logging server from the Example section. The logger() function shown below is the entry point to each thread that has its own unique connection to a remote client application. The main difference is that the logger() function uses the TS_Proxy template class defined in implementation activity 2.3 (496) to access the errno value.

This template is instantiated by the following Error_Logger class: class Error_Logger { // Define a simple logging API. public: // Return the most recent error residing in thread- // specific storage. int last_error (); // Format and display a logging message. void log (const char *format, ...); // ... };

The Error_Logger class defines the type of the 'logically' global, but 'physically' thread-specific, logger object, which is created via the following TS_Proxy thread-specific object proxy template: static TS_Proxy<Error_Logger> my_logger;

The logger() function is called by each connection handling thread in the logging server. We use the SOCK_Stream class described in the Wrapper Facade pattern (47) to read data from the network connection, instead of accessing the lower-level C Socket API directly: static void *logger (void *arg) { // Network connection stream. SOCK_Stream *stream = static_cast <SOCK_Stream *> arg; // Read a logging record from the network connection // until the connection is closed. for (;;) { char log_record[MAXREC]; // Check to see if the <recv> call failed, which // is signified by a return value of -1. if (stream->recv (log_record, MAXREC) == -1) { if (my_logger->last_error () == EWOULDBLOCK)

421

// Sleep a bit and try again. sleep (1) ; else // Record error result. my_logger->log ("recv failed, errno = %d", my_logger->last_error ()) ; } else // Other processing. } }

Consider the call to the my_logger->last_error() method above. The C++ compiler generates code that replaces this call with two method calls. The first is a call to the TS_Proxy::operator->, which returns the appropriate Error_Logger instance residing in thread-specific storage. The compiler then generates a second method call to the last_error() method of the Error_Logger object returned by the previous call. In this case, TS_Proxy behaves as a proxy that allows an application to access and manipulate the thread-specific error value as if it were an ordinary C++ object.

Variants

Starting with JDK 1.2, Java supports the Thread-Specific Storage pattern via class java.lang.ThreadLocal. An object of class java.lang.ThreadLocal is a thread-specific object proxy, which corresponds to one row in our two-dimensional matrix analogy. ThreadLocal objects are often created as static variables in a central location so that they are broadly visible. A ThreadLocal internal hash table maintains the entries for the thread-specific objects, one per-thread. These entries are of type Object, which means that the hash table does not know the concrete type of the objects it holds. Applications must therefore maintain that knowledge and perform the necessary downcasting, which has the pros and cons discussed in implementation activities 1.1 (484) and 2 (491).

A Java application thread can set the value of a ThreadLocal object foo by calling foo.set(newValue). The foo object of type Thread-Local uses the thread identifier to return the thread's current object entry from the hash table. The hash table is a normal data structure, but by calling Collections.synchronizedMap(hashtable) wraps a thread-safe layer around hashtable. This feature combines the Decorator [GoF95] and Thread-Safe Interface patterns (345) to ensure that an existing Java collection will be serialized correctly.

The class java.lang.InheritableThreadLocal is an extension of the ThreadLocal class. This subclass allows a child thread to inherit all thread-specific objects from its parent thread, with values preset to its current parent's values.

Known Uses

Widely-used examples of the Thread-Specific Storage pattern are operating system platforms, such as Win32 and Solaris, that support the errno mechanism. The following definition of errno is defined by Solaris in <errno.h>: #define errno (*(___errno()))

422

The ___errno() function invoked by this macro can be implemented as follows, based upon the low-level C thread-specific storage functions we described in implementation activity 1 (484): int *___errno () { // Solaris ensures that static synchronization // objects are always initialized properly. static pthread_mutex_t keylock; static pthread_key_t key; static int once; int *error_number = 0; if (once) { // Apply Double-Checked Locking Optimization. pthread_mutex_lock (&keylock); if (once) { // Note that we pass in the <free> function // so the <error_number> memory will be // deallocated when this thread exits! pthread_key_create (&key, free); once = 1; } pthread_mutex_unlock (&keylock); } // Use <key> to retrieve <error_number> from the // thread-specific object set. pthread_getspecific (key, &error_number); if (error_number == 0) { // If we get here, then <error_number> has not // been created in this thread yet. Thus, we'll // create it and store it into the appropriate // location in the thread-specific object set. error_number = (int *) malloc (sizeof (int)); pthread_setspecific (key, error_number); } return error_number; }

The Win32 GetLastError() and SetLastError() functions implement the Thread-Specific Storage pattern in a similar manner.

In the Win32 operating system API, windows are owned by threads [Pet95]. Each thread that owns a window has a private message queue where the operating system enqueues

423

user interface events. Event processing API calls then dequeue the next message on the calling thread's message queue residing in thread-specific storage.

The Active Template Library from COM uses the Extension Interface approach to implement the Thread-Specific Storage pattern.

OpenGL [NDW93] is a C API for rendering three-dimensional graphics. The program renders graphics in terms of polygons that are described by making repeated calls to the glVertex() function to pass each vertex of the polygon to the library. State variables set before the vertices are passed to the library determine precisely what OpenGL draws as it receives the vertices. This state is stored as encapsulated global variables within the OpenGL library or on the graphics card itself. On the Win32 platform, the OpenGL library maintains a unique set of state variables in thread-specific storage for each thread using the library.

Thread-specific storage is used within the ACE framework [Sch97] to implement its error handling scheme, which is similar to the approach described in the Example Resolved section. In addition, ACE implements the type-safe thread-specific object proxy using C++ templates, as described in implementation activity 2 (491). The ACE thread-specific storage template class is called ACE_TSS.

Local telephone directory services. A real-life example of the Thread-Specific Storage pattern is found in telephone directory services. For example, in the United States, the 'logically global' number 411 can be used to connect with the local directory assistance operator for a particular area code or region.

Consequences

There are four benefits of using the Thread-Specific Storage pattern:

Efficiency. The Thread-Specific Storage pattern can be implemented so that no locking is necessary to access thread-specific data. For example, by placing errno into thread-specific storage, each thread can reliably and efficiently set and test the completion status of methods called within that thread, without using complex synchronization protocols. This design eliminates locking overhead for data shared within a thread, which is faster than acquiring and releasing a mutex [EKBF+92].

Reusability. Applying the Wrapper Facade pattern (47) and decoupling the reusable the Thread-Specific Storage pattern code from application-specific classes can shield developers from subtle and non-portable thread-specific key creation and allocation logic. For example, the Double-Checked Locking Optimization pattern (365) can be integrated into a reusable thread-specific object proxy component to prevent race conditions automatically.

Ease of use. When encapsulated with wrapper facades, thread-specific storage is relatively straightforward for application programmers to use. For example, thread-specific storage can be hidden completely at the source-code level by abstractions, such as the thread-specific object proxy templates, or macros, such as errno. Changing a class to or from a thread-specific class therefore simply requires changing the way in which an object of the class is defined.

Portability. Thread-specific storage is available on most multi-threaded operating systems platforms. It can be implemented conveniently on platforms that lack it, such as VxWorks or pSoS. Furthermore, thread-specific object proxies can encapsulate platform-dependent operations behind a uniform and portable interface. Porting an application to another thread library, such as the TLS interfaces in Win32, therefore only requires changing the TS_Proxy class, rather than application code that uses the class.

424

However, the following are liabilities of using the Thread-Specific Storage pattern:

It encourages the use of (thread-specific) global objects. Many applications do not require multiple threads to access thread-specific data via a common access point. In this case, data should be stored so that only the thread owning the data can access it.

Consider our logging server that uses a pool of threads to handle incoming logging records from clients. In addition to writing the logging records to persistent storage, each thread can log the number and type of services it performs. This logging mechanism could be accessed as a global Error_Logger object via thread-specific storage. However, a simpler approach, though potentially less efficient and more obtrusive, is to represent each logger thread as an active object (369), with an instance of the Error_Logger stored as a data member rather than in thread-specific storage. In this case, the Error_Logger can be accessed as a data member by active object methods or passed as a parameter to all external methods or functions called by the active object.

It obscures the structure of the system. The use of thread-specific storage potentially makes an application harder to understand, by obscuring the relationships between its components. For example, it is not obvious from examining the source code of our logging server that each thread has its own instance of Error_Logger, because my_logger resembles an ordinary global object. In some cases it may be possible to eliminate the need for thread-specific storage, by representing relationships between components explicitly via containment or aggregation relationships.

It restricts implementation options. Not all languages support parameterized types or smart pointers, and not all application classes offer Extension Interfaces (141). 'Elegant' implementation solutions for the thread-specific object proxy therefore cannot be applied for all systems. When this occurs, less elegant and less efficient solutions, such as polymorphism or low-level functions, must be used to implement the Thread-Specific Storage pattern.

See Also

Thread-specific objects such as errno are often used as per-thread singletons [GoF95]. Not all uses of thread-specific storage are singletons, however, because a thread can have multiple instances of a type allocated from thread-specific storage. For example, each instance of an active object (369) implemented via an ACE_Task [Sch97] stores a thread-specific cleanup hook.

The Thread-Specific Storage pattern is related to the Data Ownership pattern [McK95], where a thread mediates client access to an object.

Credits

Tim Harrison and Nat Pryce were co-authors of the original version of the Thread-Specific Storage pattern. Thanks to Tom Cargill for comments on the original version of this pattern.

[11]Note that this use case contrasts with the situation in which multiple threads collaborate on a single task using global or shared data. In that case, the data is not thread-specific and each thread's access to it must be controlled via a synchronization mechanism, such as a mutex or semaphore.

426

Chapter 6: Weaving the Patterns Together "The limits of my language are the limits of my world."

Ludwig Wittgenstein

"No pattern is an island, entire of itself; every pattern is a piece of the continent, a part of the main."

Paraphrase of John Donne's 'Devotions'

The patterns in this book can be applied individually, each helping to resolve a particular set of forces related to concurrency and networking. However, just using these patterns in a stand-alone way limits their power unnecessarily, because real-world software systems cannot be developed effectively by resolving problems in isolation.

To increase the power of this book, this chapter shows how the patterns presented in Chapter 2 through 5 connect, complement, and complete each other to form the basis of a pattern language for building high-quality distributed object computing middleware, and concurrent and networked applications. In addition, we outline how many of these patterns can be applied outside the context of concurrency and networking.

6.1 From Individual Patterns to Pattern Languages The patterns presented in Chapter 2 are described in a self-contained manner, as are the patterns in [POSA1]. For example, the patterns' contexts are expressed as generally as possible, to avoid limiting their applicability to a particular configuration of other problems, patterns, or designs. The patterns can therefore be applied whenever a problem arises that they address. Moreover, neither the patterns' solution descriptions nor their implementation guidelines focus on the solutions of similar problems described by other patterns. Each pattern references only those patterns that help implement its own solution structure and dynamics.

No Pattern is an Island

Unfortunately, focusing on individual patterns does not support the construction of real-world software systems effectively. For example, many inter-related design problems and forces must be resolved when developing concurrent and networked systems, as shown in the Web server example described in Chapter 1, Concurrent and Networked Objects. These relationships must be considered when addressing key design and implementation issues. Regardless of their individual utility, therefore, stand-alone patterns can only resolve small problems in isolation, as they do not consider the larger context in which they apply.

Even if stand-alone patterns were somehow connected, giving them the potential to solve larger problems, it might be hard to extract these relationships from their descriptions. For example, if two pattern descriptions reference each other, it may not be obvious when one pattern should be applied before the other. When developing software applications and systems with patterns, however, the order in which the patterns are applied may be crucial for their successful integration [Bus00a].

The importance of proper ordering is particularly relevant for architectural patterns, which introduce a structure that defines the base-line architecture for an entire software system. Each component in such a structure is 'complex' by itself, and often these components can

427

be implemented using other design patterns. It is therefore crucial to express the precise relationships between such patterns to determine which pattern to apply first and which later.

For example, the Reactor architectural pattern (179) introduces a participant—event handler—whose concrete implementations may define their own concurrency model. The discussion of an event handler's implementation in the Variants section of the Reactor pattern therefore references applicable concurrency patterns, including Active Object (369) and Monitor Object (399).

Similarly, to illustrate which particular types of event handlers are useful for networked applications, the Reactor's implementation guidelines reference the Acceptor-Connector pattern (285). In turn, this pattern introduces a participant—the service handler—whose concrete implementations may also define their own concurrency model. The implementation guidelines for service handlers thus reference the Active Object and Monitor Object patterns again. This somewhat convoluted set of inter-relationships among the various patterns participants is illustrated in the figure below:

When reading the Reactor pattern description in isolation, however, it is not obvious how to apply Active Object or Monitor Object effectively in the presence of Acceptor-Connector, which may also use these patterns. For example, the Reactor pattern does not specify whether or not a Reactor should apply Active Object to implement a particular type of event handler. Nor does it specify whether an Acceptor-Connector that uses the Reactor should use Monitor Object to implement its acceptors, connectors, and service handlers.

In general, not all possible combinations of these four patterns are useful. However, because each pattern description is self-contained and independent of the others, it is hard to extract the useful combinations from the individual pattern descriptions.

Towards Pattern Languages

To support the development a particular family of software systems or application frameworks [POSA1], a broader viewpoint should be applied to the set of available patterns. In particular, patterns should not be considered solely as islands. They should instead be woven into networks of interrelated patterns that define a process for resolving software development problems systematically [Cope96] [Gab96] [Cope97]. We call these pattern languages.

Pattern languages are not formal languages, although they do provide a vocabulary for talking about particular problems [SFJ96]. Together, patterns and pattern languages help developers communicate architectural knowledge, learn a new design paradigm or architectural style, and elude traps and pitfalls that have been avoided traditionally only through costly experience [PLoPD1].

One or more patterns define the 'entry point' of a pattern language and address the coarsest-grained problems that must be resolved when developing a particular type of application or part of an application. When reading this book, you may identify the architectural patterns as those addressing coarse-grained problems in software architecture.

428

Each entry point pattern specifies which other patterns should be used to resolve sub-problems of the original problem, as well as the order in which to apply these other patterns.

The referenced patterns therefore complete these 'larger' patterns, which in turn specify useful combinations of the 'smaller' patterns in the presence of a particular problem. In software architecture, these smaller patterns often correspond to design patterns. The smaller patterns may then define how to apply other patterns in their own solutions to resolve additional sub-problems. This iterative decomposition of larger into smaller patterns continues until all problems in a given domain are addressed by a designated pattern.

As pattern writers become more familiar with their domain, therefore, they should strive to connect patterns that can complement and 'complete' each other [AIS77]. By applying one pattern at a time [Ale79] [Bus00b] and following the relationships between the patterns, it becomes possible to generate high-quality software architectures and designs. The resulting pattern language is 'synergistic', that is, it is more than the sum of its constituent patterns. For example, the connected patterns help to produce better system architectures by resolving groups of problems that arise during software development. Each solution builds upon the solutions of related problems that are addressed by patterns in the language.

To illustrate this iterative decomposition process, the patterns from the Reactor (179) example shown earlier in this chapter can be integrated to form a mini pattern language. Rather than suggesting the Acceptor-Connector pattern (285) as an option to implement a Reactor's event handlers, we could require that it be applied. In a Reactor's architectural structure there could therefore be three types of event handlers—acceptors, connectors and service handlers—where the latter can be implemented using a concurrency model, such as Active Object (369) and Monitor Object (399):

Refactoring the relationships between these four patterns in this manner has two effects on a Reactor implementation: § It ensures that implementors of the Reactor pattern specify the 'right' types of event

handlers—acceptors, connectors, and service handlers—associated with the Acceptor-Connector pattern. Relating the Active Object and Monitor Object patterns with Acceptor-Connector also clarifies which type of event handler—the service handlers—is most useful for introducing concurrency.

§ The references to the concurrency patterns can be removed from the Reactor pattern and need only appear in the Acceptor-Connector pattern. This simplifies the relationships between the four patterns and emphasizes the important ones.

This mini pattern language we have created with the Reactor, Acceptor-Connector, Active Object, and Monitor Object patterns helps to generate good software architectures, because common implementation mistakes that might occur if the Reactor or one of the other patterns was applied in isolation can be avoided. Refactoring the patterns' relationships also helps to improve our collective understanding of how the four patterns connect to a broader pattern language that can generate larger architectural structures.

6.2 A Pattern Language for Middleware and Applications In this section we want to explore the relationships that exist between the patterns described in this book. Our goal is to define the foundation of a pattern language that supports the

429

development of concurrent and networked software systems more effectively than by merely applying the patterns in isolation.

We apply a two-step process to connect the patterns in this book into a pattern language: § Identify pattern relationships. Firstly we examine the self-contained descriptions of

each pattern to determine which relationships listed in the patterns should be kept and which should be ignored. In particular, we only consider the 'uses' relationship among the patterns and ignore all others, such as the 'is used by' and transitive relationships. We also include all optional uses of other patterns into our set of relationships.

§ Define pattern ordering. Secondly, based on the remaining relationships, we then define the order of the patterns in the language, that is, which patterns are entry points, which patterns follow, and which patterns are leafs. In our language, we define the patterns with the broadest scope—the architectural patterns—as its entry points. The 'uses' relationships then define the ordering between the patterns.

The Pattern Language in Detail

By following the strategy outlined above, we connect the patterns described in this book to a pattern language. This language is summarized in the following diagram. We recommend that you refer back to this diagram as you are reading the pattern language entries.

If a pattern uses another pattern in its implementation, it points to the pattern with an arrow. 'Duplicate' entries for patterns that are frequently referenced by other patterns avoid having too many crossed relationships. Architectural patterns are shaded to indicate where the language 'begins'.

Half-Sync/Half-Async

The Half-Sync/Half-Async architectural pattern (423) structures concurrent systems that can be implemented using a mixture of asynchronous and synchronous service processing.

430

The pattern introduces two designated layers for asynchronous and synchronous service processing, plus a queuing layer that allows services to exchange messages and data between the other two layers.

If the operating system supports sophisticated asynchronous I/O operations, the Proactor pattern (215) can be used to implement the asynchronous service processing layer. The Active Object pattern (369) and Monitor Object pattern (399) can help implement the queueing layer. The Half-Sync/Half-Reactive variant of the Half-Sync/Half-Async pattern can be implemented by combining the Reactor pattern (179) with the Active Object pattern (369).

Leader/Followers

The Leader/Followers architectural pattern (447) provides a concurrency model that allows multiple threads to take turns sharing a set of event sources, to detect, demultiplex, dispatch, and process service requests that occur on the event sources.

At the heart of this pattern is a thread pool mechanism. It allows multiple threads to coordinate themselves and protects critical sections involved with detecting, demultiplexing, dispatching, and processing events. One thread at a time—the leader—is allowed to wait for an event to occur on a set of event sources. Meanwhile other threads—the followers—can queue awaiting their turn to become the leader.

When the current leader thread detects an event from the event source set, it first promotes a follower thread to become the new leader. Then it plays the role of a processing thread, demultiplexing and dispatching the event to a designated event handler. The event handler in turn performs the required application-specific event processing. Multiple processing threads can run concurrently while the leader thread waits for new events on the set of event sources shared by the threads. After handling its event, a processing thread reverts to a follower role and waits to become the leader thread again.

The Monitor Object pattern (399) can be used to implement the thread pool mechanism that allows multiple threads to coordinate themselves. The Reactor (179) or Proactor (215) patterns can be used to demultiplex and dispatch events from the set of event sources to their designated event handlers.

Reactor

The Reactor architectural pattern (179) structures event-driven applications, particularly servers, that receive requests from multiple clients concurrently but process them iteratively.

The pattern introduces two co-operating components, a reactor and a synchronous event demultiplexer. These demultiplex and dispatch incoming requests to a set of event handlers, which define the application's services that process these requests. Requests from clients are received and responses are sent through handles, which encapsulate transport endpoints in a networked system.

In general there are three types of event handlers—acceptors, connectors, and service handlers—as specified by the Acceptor-Connector pattern (285). The handles that encapsulate the IPC mechanisms are often implemented according to the Wrapper Facade design pattern (47). A thread-safe reactor can be implemented using the Strategized Locking pattern (333). The timer queue mechanism of the Reactor pattern can use the Asynchronous Completion Token pattern (261) to identify which event handler has expired.

Proactor

431

The Proactor architectural pattern (215) structures event-driven applications, particularly servers, that receive and process requests from multiple clients concurrently.

Application services are split into two parts: § Operations that execute asynchronously, for example to receive client requests. § Corresponding completion handlers that process the results of these asynchronous

operations, for example a particular client request.

Asynchronous operations are executed by an asynchronous operation processor, which inserts the results of these asynchronous operations—the completion events—into a completion event queue. A proactor then removes the completion events from the completion event queue using an asynchronous event demultiplexer, then dispatches them on the appropriate completion handler to finish processing the service. Requests from clients are received and responses are sent via handles, which encapsulate transport endpoints in a networked system.

To support an effective demultiplexing and dispatching of completion events from asynchronous operation to their designated completion handler, the asynchronous operation processor and the proactor can both use the Asynchronous Completion Token pattern (261) to identify which asynchronous operation has finished and which completion handler should process its results. As with the Reactor pattern, there are three general types of completion handlers—acceptors, connectors, and service handlers—as specified by the Acceptor-Connector pattern (285). The asynchronous operation processor, as well as the handles that encapsulate the IPC mechanisms, can be implemented according to the Wrapper Facade pattern (47). A thread-safe proactor can be implemented using the Strategized Locking pattern (333).

Interceptor

The Interceptor architectural pattern (109) allows functionality to be added to an application framework transparently. This functionality is invoked automatically when framework-internal events occur.

The Interceptor pattern specifies and exposes an interceptor interface callback for selected event types internal to a framework. Applications can derive concrete interceptors from this interface to implement out-of-band functionality that processes occurrences of the corresponding event type in an application-specific manner. A dispatcher is provided for every interceptor, so that applications can register their concrete interceptors with the framework. The framework calls back the concrete interceptors via their associated dispatchers whenever the designated event occurs. Concrete interceptors that must modify framework behavior during their event processing can leverage context objects, which provide controlled access to the framework's internal state. Context objects are passed to concrete interceptors when they are dispatched by the framework.

The Extension Interface pattern (141) can be used to help avoid implementing multiple concrete interceptors. Instead, a single interceptor implements multiple interfaces, each corresponding to a particular concrete interceptor. Similarly, the Component Configurator pattern (75) can be used to link concrete interceptors into a concrete framework dynamically at run-time.

Acceptor-Connector

The Acceptor-Connector design pattern (285) decouples connection establishment and service initialization from service processing in a networked system.

This pattern introduces three types of components:

432

§ Service handlers define one half of an end-to-end service in a networked system and process requests from their connected remote peer.

§ Acceptors perform passive connection establishment, accept connection requests from remote peers, and initialize a service handler to process subsequent service requests from these peers.

§ Connectors perform active connection establishment and initiate a connection to a remote component on behalf of a service handler. This then communicates with the remote component once the connection is established.

Acceptors, connectors, and service handlers send and receive requests from peers via handles, which encapsulate transport endpoints in a networked system.

To process multiple service requests simultaneously, service handlers can be implemented using the concurrency models defined by the Active Object (369) and Monitor Object (399) patterns. The handles used to access the underlying operating system IPC mechanisms can be implemented via the Wrapper Facade pattern (47).

Component Configurator

The Component Configurator design pattern (75) allows a system to link and unlink its component implementations at run-time without having to modify, recompile, or statically relink the application. It also supports the reconfiguration of components into different processes without having to shut down and re-start running processes.

In this pattern a component defines a uniform interface for configuring and controlling a particular type of application service or functionality that it provides. Concrete components implement the interface in an application-specific manner. Applications or administrators can use component interfaces to initialize, suspend, resume, and terminate their concrete components dynamically, as well as to obtain run-time information about each configured concrete component. Concrete components are packaged into a suitable unit of configuration, such as a dynamically linked library (DLL) or shared library, that can be linked and unlinked in and out of an application dynamically under control of a component configurator. This uses a component repository to keep track of all concrete components configured into an application.

Components configured into a networked system using Component Configurator can be acceptors, connectors, and service handlers as defined by the Acceptor-Connector pattern (285), or interceptors, as defined by the Interceptor pattern (109).

Active Object

The Active Object design pattern (369) decouples method execution from method invocation to enhance concurrency and simplify synchronized access to objects that reside in their own threads of control.

A proxy represents the interface of an active object and a servant provides the object's implementation. Both the proxy and the servant run in separate threads, so that method invocations and method executions can run concurrently. At run-time the proxy transforms the client's method invocations into method requests, which are stored in an activation list by a scheduler. The scheduler's event loop runs continuously in the same thread as the servant, dequeueing method requests from the activation list and dispatching them on the servant. Clients can obtain the result of a method's execution via a future returned by the proxy when the method was invoked.

433

The Extension Interface pattern (141) helps provide role-specific proxies, so that clients access only those services of an active object that they require. The Monitor Object pattern (399) can be used to implement a thread-safe activation list.

Monitor Object

The Monitor Object design pattern (399) synchronizes concurrent method execution to ensure that only one method at a time runs within an object. It also allows an object's methods to schedule their execution sequences cooperatively.

Clients can only access the functions defined by a monitor object via its synchronized methods. To prevent race conditions involving monitor object state, just one synchronized method at a time can run within a monitor object. Each monitored object contains a monitor lock that synchronized methods use to serialize their access to an object's behavior and state. In addition, synchronized methods can determine the circumstances under which they suspend and resume their execution, based on one or more monitor conditions associated with a monitor object.

The Extension Interface pattern (141) helps export role-specific views on a monitor object, so that clients only access those services that they require. The Thread-Safe Interface pattern (345) helps prevent self-deadlock when a synchronized method of a monitor object calls another synchronized method on the same monitor object.

Wrapper Facade

The Wrapper Facade design pattern (47) encapsulates the functions and data provided by existing non-object-oriented APIs within more concise, robust, portable, maintainable, and cohesive object-oriented class interfaces.

The Extension Interface pattern (141) can be used to allow clients to access certain implementation-related aspects of a wrapper facade. The Thread-Specific Storage pattern (475) may be useful when implementing the error-handling mechanism of a wrapper facade on platforms that do not support exception handling efficiently or portably.

Extension Interface

The Extension Interface design pattern (141) prevents bloating of interfaces and breaking of client code when developers extend or modify the functionality of components. Multiple extension interfaces can be attached to the same component, each defining a contract between the component and its clients.

Using the Extension Interface pattern, a component's functionality is exported only via extension interfaces, one for each role it implements. Clients therefore access interfaces but never access component implementations directly. An associated factory is responsible for creating component instances and returning an initial interface reference to clients. Clients can use this interface to retrieve other extension interfaces.

Implementations of the extension interface functionality within components can use the Active Object (369) and Monitor Object (399) patterns to run in their own thread of control.

Asynchronous Completion Token

The Asynchronous Completion Token design pattern (261) allows clients to invoke operations on services asynchronously and to dispatch their subsequent processing actions efficiently when the operations complete and return their results.

434

For every asynchronous operation that a client invokes on a service, the client creates an asynchronous completion token (ACT) that identifies the actions and state necessary to process the operation's completion. The client passes the ACT to the service together with the operation. When the service replies to the client, its response must include the ACT that was sent originally. The client then uses the ACT to identify the completion handler that processes the results of the asynchronous operation.

Thread-Specific Storage

The Thread-Specific Storage design pattern (475) allows multiple threads to use one 'logically global' access point to retrieve an object that is local to a thread—called a 'thread-specific object'—without incurring locking overhead for each access to the object.

The thread-specific objects of a particular thread are maintained using a thread-specific object set. The global access point to a particular thread-specific object can be implemented by a thread-specific object proxy. This hides the details of the creation of the thread-specific object and retrieves it from the thread-specific object set when accessing its methods.

An alternative to accessing the methods of a thread-specific object via the proxy is to use the Extension Interface pattern (141), in which the proxy returns an interface that the thread-specific object implements. The Double-Checked Locking Optimization pattern (353) is often used to create a thread-specific object correctly and transparently in multithreaded applications.

Thread-Safe Interface

The Thread-Safe Interface design pattern (345) minimizes locking overhead and ensures that intra-component method calls do not incur 'self-deadlock' by trying to reacquire a lock that a component already holds. By using this pattern, a component's methods are divided into two categories, implementation and interface methods: § Implementation methods, which are internal to the component and cannot be called by

its clients, implement the component's functionality, if necessary by calling other implementation methods. Implementation methods 'trust' that they are called correctly and thus do not acquire/release locks.

§ In contrast, interface methods export the component's functionality to clients. These methods first 'check' by acquiring a lock, delegating the method's execution to an appropriate implementation method, and finally releasing the lock when the implementation method finishes executing. Interface methods never call other interface methods on the same component.

The Strategized Locking pattern (333) can be used to implement the acquisition and release of locks, as well as to parameterize the type of lock being used.

Double-Checked Locking Optimization

The Double-Checked Locking Optimization design pattern (353) reduces contention and synchronization overhead whenever critical sections of code must acquire locks in a thread-safe manner only once during program execution.

The pattern uses a flag to indicate whether it is necessary to execute a critical section before acquiring the lock that guards it. If the critical section code has already been initialized, it need not be executed again, thereby avoiding unnecessary locking overhead.

The Strategized Locking pattern (333) is used to implement the acquisition and release of locks, as well as to parameterize the type of lock being used.

435

Strategized Locking

The Strategized Locking design pattern (333) parameterizes synchronization mechanisms in a component that protect its critical sections from concurrent access. This allows a component's synchronization mechanisms to be implemented as 'pluggable' types. Each type objectifies a particular synchronization strategy, such as a mutex, readers/writer lock, or semaphore. Instances of these pluggable types can be defined as objects contained within a component, which can use these objects to synchronize its method implementations efficiently.

The Scoped Locking idiom (325) can be used to acquire and release a particular type of lock that is parameterized into a component via the Strategized Locking pattern. Moreover, Strategized Locking can templatize guard classes that apply the Scoped Locking idiom, so that synchronization mechanisms can be parameterized transparently.

Scoped Locking

The Scoped Locking C++ idiom (325) ensures that a lock is acquired when control enters a scope and released automatically when control leaves the scope, regardless of the return path from the scope.

The pattern defines a guard class whose constructor acquires a lock automatically when control enters a scope and whose destructor releases the lock automatically when control leaves the scope. Instances of the guard class are created to acquire and release locks in method or block scopes that define critical sections.

A Discussion of the Pattern Language

The condensed description of the patterns and the pattern relationship diagram above reveal how most of the patterns complement and complete each other in multiple ways to form a pattern language: § Although each pattern is useful in isolation, the pattern language is even more

powerful, because it integrates solutions to particular problems in important problem areas, such as event handling, connection management and service access, concurrency models, and synchronization strategies. Each problem in these problem areas must be resolved coherently and consistently when developing concurrent and networked systems.

§ The pattern language also exposes the interdependencies of these general problem areas. For example, when selecting a particular event-handling pattern for a networked application, not all potentially-available concurrency patterns can be applied usefully.

These two points become clear only when connecting patterns into a pattern language, because each pattern in isolation only focuses on itself. This makes it harder to recognize pattern inter-relationships and solve more complex system architecture problems effectively. In contrast, a pattern-based design can fulfill a software system's requirements more successfully by integrating the patterns consistently and synergistically.

Our pattern language has been applied to many real-world applications, in particular, but not only to systems that are built using the ACE framework [Sch97]. This language is therefore an important tool for specifying and implementing middleware and applications. Note, however, that the pattern language is incomplete, providing the foundation for a larger language for developing distributed object computing middleware, and concurrent and networked applications.

436

Fortunately our pattern language can be completed by applying other patterns defined in the pattern literature. For example, the Interceptor pattern (109) is orthogonal to most other patterns presented in this book. The Broker architectural pattern [POSA1], however, defines a fundamental structure for distributed software systems that often uses the Interceptor pattern to support out-of-band extensions [NMM99] [HS99a], and the Half-Sync/Half-Async architectural pattern (423) or its Half-Sync/Half-Reactive variant [Sch98b] to structure its broker component. The Broker pattern therefore connects the Interceptor pattern with the other patterns in our pattern language and clarifies their inter-relationships explicitly.

Other patterns from the literature help refine the patterns in our language. For example, peer service handlers defined by the Acceptor-Connector pattern (285) can be implemented using the Half Object plus Protocol pattern [Mes95] and/or the Abstract Session pattern [Pry99]. Similarly, the Remote Proxy variant of the Proxy pattern [GoF95] [POSA1] can be used to implement a particular Interceptor (109) variant. Other examples exist: the Forwarder-Receiver pattern [POSA1] helps implement the Half-Sync/Half-Reactive variant of the Half-Sync/Half-Async pattern (423). Similarly, the Broker [POSA1] and Object Synchronizer [SPM99] patterns can be used to implement variants of the Active Object pattern (369).

Our pattern language can also integrate patterns motivated originally by examples from domains other than concurrency and networking. For example, the patterns described in this book reference many well-known general-purpose patterns, including Abstract Factory, Adapter, Bridge, Command, Decorator, Facade, Factory Method, Iterator, Mediator, Memento, Observer, Singleton, Strategy, and Template Method from [GoF95], Command Processor, Layers, Pipes and Filters, and Reflection from [POSA1], Manager and Null Object from [PLoPD3], and Hook Method from [Pree95].

The integration of all these connected patterns forms a broader pattern language for developing distributed object computing middleware, and concurrent and networked applications. This language can undoubtedly be extended with yet other published patterns or those that remain to be discovered and documented. With each extension the pattern language will become more powerful, complete, and expressive. In this way we can improve the integration of patterns for concurrent and networked software systems.

6.3 Beyond Concurrency and Networking In the preceding sections we show how the patterns described in this book, together with patterns from other sources, define the basis of a pattern language for developing distributed object computing middleware, and concurrent and networked applications. While this pattern language accentuates the use of these patterns in this particular domain, many of the patterns also apply outside of it.

Analyzing the Problem sections and Known Uses sections of the patterns reveals that the scope of many of them is broader than the focus of this book implies. Some patterns, for example Wrapper Facade (47), are generally applicable to any domain where it is necessary to encapsulate existing stand-alone functions and data with object-oriented class interfaces. Other patterns apply to particular types of problems that arise in many systems. Extension Interface (141), for example, addresses how to design extensible access to functionality provided by a multi-role component.

In this section, therefore, we outline domains beyond concurrency and networking in which the patterns in this book can be applied.

Graphical User Interfaces

437

Several of the patterns we describe have been used to design and implement a wide variety of graphical user interfaces: § The Wrapper Facade pattern (47) is often used to encapsulate details of a particular

GUI library to conceal its implementation details from application developers. Two prominent known uses of the pattern in the GUI library context are the Microsoft Foundation Classes (MFC) [Pro99] and the Java Swing library [RBV99].

§ Variants of the Reactor (179) pattern have been applied to organize event handling in systems with graphical user interfaces. For example, the Reactor pattern is implemented by the Interviews Dispatcher framework where it is used to define an application's main event loop and manage connections to one or more physical GUI displays [LC87]. The Reactor pattern is also used in the Xt toolkit from the X Windows distribution.

Components

Several patterns apply in the context of components and component-based development: § The Wrapper Facade pattern (47) specifies how to implement collections of cohesive

low-level components and apply them in various contexts, such as components for threading and interprocess communication [Sch97].

§ The Component Configurator pattern (75) supports the dynamic configuration and reconfiguration of component implementations. In addition to being used as the basis for installing operating system device drivers dynamically [Rago93], this pattern is also the basis for downloading and configuring Java applets dynamically [JS97b].

§ The Interceptor pattern (109) introduces a mechanism for building extensible components and applications. Its Known Uses section lists contemporary component models that apply this pattern, such as Microsoft's Component Object Model (COM) [Box97], Enterprise JavaBeans (EJB) [MaHa99], and the CORBA Component Model (CCM) [OMG99a].

§ The Extension Interface pattern (141) defines a general mechanism for designing components and allowing clients access to their services. All contemporary component standards, such as COM, EJB, and CCM, implement variants of this pattern.

General Programming

Some patterns or idioms in this book can be applied to programming in general: § Scoped Locking (325) is a specialization of a general C++ programming technique for

safe resource acquisition and release. This technique, described in a more general context by Bjarne Stroustrup in [Str97], is known as 'Object-Construction-is-Resource-Acquisition'.

§ Double-Checked Locking Optimization (353) can be used to protect code that should be executed just once, particularly initialization code.

In summary, the seven distinct patterns and idioms discussed above—Wrapper Facade, Reactor, Component Configurator, Interceptor, Extension Interface, Scoped Locking, and Double-Checked Locking Optimization—are clearly applicable beyond the scope of concurrency and networking. If you analyze well-designed software systems you will probably discover other domains in which these or other POSA2 patterns apply. Although this book has presented the patterns primarily in the context of developing concurrent and networked systems, it is important to recognize these patterns can help resolve recurring problems in other domains.

6.4 Pattern Languages versus Pattern Systems

438

The previous sections explore the pattern language aspects of the patterns presented in this book. In addition to defining the foundation of a pattern language for building distributed object computing middleware, and concurrent and networked applications, however, we can also organize the patterns in this book into a pattern system [POSA1]. For example, we can extend the pattern system defined in [POSA1] with the problem areas covered by patterns in this book: service access and configuration, event handling, synchronization, and concurrency. We can then reclassify the patterns accordingly.

This classification scheme presents an interesting conceptual exercise for taxonomizing the pattern space. Each pattern can classified and assigned to a cell in a multi-dimensional matrix, with the dimension of each matrix denoting a particular pattern property. If isolated problems must be resolved, this taxonomy can enable rapid access to potentially useful pattern-based solutions. The following table shows one way to organize the patterns from this book, together with selected patterns from [GoF95] [POSA1] [PLoPD1] [PLoPD2] [PLoPD3] [PLoPD4], into a pattern system for concurrency and networking.[1]

Architectural Pattern Design Pattern Idiom

Base-line Architecture

Broker Layers

Microkernel

Communication Pipes and Filters

Abstract Session [Pry99]

Command Processor Forwarder-Receiver Observer [GoF95] Remote Operation

[KTB98] Serializer [RSB+97]

Initialization

Activator [Sta100] Client-Dispatcher-

Server Evictor [HV99] Locator [JK00] Object Lifetime

Manager [LGS99]

Service Access and Configuration Interceptor

Component Configurator

Extension Interface Half Object plus Protocol [Mes95] Manager-Agent

[KTB98] Proxy

Wrapper Facade

Event Handling Proactor Acceptor-Connector

439

Architectural Pattern Design Pattern Idiom

Reactor Asynchronous Completion Token Event Notification

[Rie96] Observer [GoF95]

Publisher-Subscriber

Synchronization Object Synchronizer [SPM99]

Balking [Lea99a] Code Locking [McK95] Data Locking [McK95] Guarded Suspension

[Lea99a] Double-Checked

Locking Optimization Reader/Writer Locking

[McK95] Specific Notification

[Lea99a] Strategized Locking

Thread-Safe Interface

Scoped Locking

Concurrency Half-Sync/Half-

Async Leader/Followers

Active Object Master-Slave

Monitor Object Producer-Consumer

[Grand98] Scheduler [Lea99a]

Two-phase Termination [Grand98]

Thread-Specific Storage

Categorizing patterns according to certain specific areas or properties fails to capture the relationships and interdependencies that exist between a particular set of patterns to some extent, however. These relationships and interdependencies influence a pattern's applicability in the presence of other patterns, because not every pattern can be combined with ever other in a meaningful way.

For example, Thread-Specific Storage (475) may be inappropriate for use with a Leader/Followers (447) thread pool design, because there may not be a fixed association between threads in the pool and events that are processed over time. In general, therefore, it is important to identify the 'right' pattern combinations when building real-world systems, because these systems exhibit dependencies between many problems that must be resolved.

440

Large pattern systems also tend to be complex, because the more problem areas a pattern system includes, the more likely a pattern is to be assigned to multiple categories. The Wrapper Facade pattern (47) is a good example of this phenomenon, because it can be used to build concurrent and networked systems, graphical user interfaces and components, as discussed in Section 6.3, Beyond Concurrency and Networking. It would appear at least three times in a pattern system that covered all these systems as a result. As the number of patterns increases, a pattern system may become bloated by these repeated pattern entries, making it hard to learn and use.

One way to resolve this problem is to specify a number of smaller pattern systems for particular domains, rather than to specify a universal pattern system. Examples are our pattern system for concurrency and networking described earlier, or a pattern system for component construction. If you apply this approach to the patterns presented in this book, however, you will notice that the pattern system is structurally similar to the pattern language whose foundations we specified in Section 6.2, A Pattern Language for Middleware and Applications.

Moreover, the resulting pattern system would not emphasize the relationships between the patterns as well as the pattern language we described, however. The patterns would instead remain as islands, and the pattern system would be less useful for a pattern-based development of complete software systems for concurrent and networked middleware and applications as a result.

It has been our experience that organizing the patterns presented in this book as a pattern language is more effective than classifying them in a pattern system. Much research remains to be done, however, to identify, document and integrate all patterns that are necessary to complete this pattern language.

[1]Patterns described in this book are in bold, patterns from [POSA1] are in italics, and other patterns from the literature are in regular font.

chapter 5: concurrency patterns

Documents