Chapter 4: Concurrency
Do you remember when single-process concurrency emerged in the eighties? Although the potential performance benefits of this featurenamed threadswere huge, using it was such a mess! Many of us were using C and C++ in those days, and the question of library thread safety became central for anyone doing multithreaded programming in those languages. If a library was not thread safe, you had to protect access to it from all locations in your source that were using it. If you forgot to protect access to your library in even one instance, you usually guaranteed yourself crashes that were hard to track and occurred only once a week, usually in your customers’ hands. Such problems could take weeks to debug, and sometimes it was easier to just rewrite the codeand hope that you would not make the same mistake twice. In an attempt to alleviate the problem, library vendors invented a standard scale to describe the level of thread safety of their products; however, the effort ultimately proved to be little help. And what about the C library itself? Was it safe to call strtok in more than one thread? The advent of multithreaded programming did as much to enhance performance as it did to degrade maintainability and robustness, and to erode project schedules. Programmers needed object middleware to fix these problems, and COM+ fit the bill.
In its infancy, the COM library itself was not thread safe. In fact, you were allowed to call CoInitialize and therefore use COM from only a single thread in your process. This is understandable, since the technology had to first establish itself as a binary compatibility standard before it could provide a more comprehensive object framework.1 But COM+ has come a long way since then. First, COM introduced the apartment model to manage the relationship between objects and threads. Later, Microsoft Transaction Server (MTS) added the concept of activities, which essentially are groupings of COM objects in a call chain that does not allow concurrency in the participating objects. Finally, COM+ added configurable synchronization domains, which decouple an object’s thread affinity from its synchronization needs and give the developer the power to determine which objects can be accessed concurrently and under which circumstances. While our current options give us tremendous power and flexibility, they also amount to a bewildering array of choices. As a result, concurrency management has become not only what I consider the area of greatest strength for COM+, but its most poorly understood topic.
Still, programmers need to fully understand concurrency management techniques and synchronization options. While only a small percentage of source code in a typical software project must be designed and written to perform well, those critical areas really do have to execute fast; otherwise, your software will fail in the eyes of your users. To achieve this performance, you must understand the technology you have at your disposal. Without this critical understanding, you risk jeopardizing not only your object implementation, but also the fundamental object design of your project, where fixing mistakes can be costly. The good news is that well-managed concurrency will give your software the speed that it needs to succeed, and that COM+ presents middleware technology eminently suited to manage synchronization in your project. The bad news is that you must get a firm grasp on this rather complex technology before you can apply it, and before you can produce a design poised to take advantage of your options. Let’s get to it, and explore the technology of COM+ concurrency from the inside out.
It is not the place of object middleware to regulate reentrancy. Only your object design can ensure that your object will not be called back while waiting for an outgoing call to return, in the event it cannot handle such a call. Your design might need to provide this assurance because blocking a reentering call would guarantee deadlock. In fact, COM+ puts some effort into ensuring that callbacks are always serviced and therefore never result in deadlock. This implementation is more challenging than you might think, since a callback can occur on threads other than the one waiting for the return of the method invocation. We will look at this issue more closely when we examine locking (in its own section) later in this chapter.
The generality of a true interceptor presents a challenge during implementation. A generic interceptor does not know the shape of the interface it will wrap once it is compiled. This lack of information at compile time creates a very interesting set of difficulties.
The Language Problem
Procedural high-level languages, including C and C++, generally are not capable of calling a function with an unknown parameter list. By using the ellipsis and va_ set of functions, you can implement a function that does not know what parameters it will be called with at compile time. However, you cannot tell the compiler to make a call to a function and simply pass the parameters that were passed to the function making the call.
This problem can be overcome only by using a piece of assembly language to make the call from the interceptor to the wrapped object. Essentially this assembly code must make the call to the target function in the wrapped object while leaving the stack frame unchanged. However, the C compiler will already have altered the stack frame with a standard function prologue segment, which lets you access local variables. Microsoft Visual C++ offers the __declspec(naked) storage class attribute, which will prevent function prologue and epilogue generation. Obviously, implementing naked functions is difficult and, along with the necessary assembly segment, requires a thorough understanding of the processor architecture for which you compile your code.
The Failure Problem
COM+ interface methods always use the __stdcall calling convention. This convention has the callee, not the caller, clean up the stack before returning from the function. This is no problem if you actually can make the call to the object for which you are intercepting, but what if your interception task fails? What if you would rather not make the call under a certain set of circumstances, or what if the object you want to dispatch the call to is unavailable? Now you are responsible for removing parameters from a stack frame whose shape you don’t even know.
Of course, there are ways to get around this problem. For example, you might require the caller to tell you the combined size of all parameters. However, this approach is somewhat awkward and makes your interception hardly transparent. Or you might try to derive the combined parameter size by querying the ITypeInfo interface of the target object. Of course, the object might not support this interface, in which case you could attempt to create a stub for the interface you want to wrap, and interpret its CInterfaceStubVtbl structure, defined in RpcProxy.h. And your interceptor must create a stub and interpret the structure before your wrapper function is called, since determining stack frame size inside the wrapper cannot tolerate failure. By now you’ve probably guessed that doing this will require significant effort.
The Post-Processing Problem
Your interception task might require work before making the call to the wrapped object, as well as afterward. This means that after the wrapped function is complete, it must return to your wrapper rather than that wrapper’s caller. Therefore, you need to change the return address on the stack so that it points within the wrapper function. But how do you remember the address of the caller to which you must return after you finish post processing? After all, there is no place on the stack to store this information.
You can save the final return address in thread local storage (TLS). But allocating a new TLS slot can be expensive if interceptor calls nest on the same thread, and you might find yourself running out of slots. Therefore, you should manage a stack of return addresses via a single slot, instead of allocating a new slot for each function invocation.
Make no mistake: implementing generic interception is very challenging and is nonportable. Even if you never need to implement an interceptor in your own software,2 understanding the issues of the task gives you a better grasp of what is happening inside the COM+ middleware, if not sympathy for the developers who created it.
Legacy in-process servers sometimes use this setting because their objects were written under the assumption that they could share global in-process server data without requiring locking. An ActiveX DLL created by Microsoft Visual Basic also supports this setting.3 The setting is not recommended for new projects because it can lead to contention among all the COM objects that are forced to share the same thread.
An object can also declare ThreadingModel equal to Both, in which case it will be created in the apartment of its caller. The value Both is used for historical reasons: it originated at a time when COM supported only two apartment types. An unconfigured component using this setting might experience concurrency, as its creator might be an MTA thread or a TNA object. The primary motivation for using this setting is to eliminate an apartment boundary between an object and its instantiator.
Table 4-1 illustrates which apartment COM and COM+ will choose for instantiation of a new unconfigured object, given that object’s ThreadingModel and the instantiating thread’s apartment membership. (Of course, the TNA row and columns are relevant to COM+ only.)
Table 4-1 Instantiation Apartment Selection
Whenever a thread invokes a method on an object across an apartment boundary, the invocation is intercepted by the object’s proxy, routed via the channel, and then delegated to the object by the stub on the other side of the channel. The COM+ middleware performs three important functions when intercepting method calls across apartments:
Making a call to an object in the TNA within the same process never requires switching to a different thread. Only the last item in the previous list needs to be performed by an interceptor guarding access to the in-process TNA. Such an interceptor is sometimes called a lightweight proxy. Compared to the overhead of a thread switch, a lightweight proxy is very fast. But the lightweight proxy still needs to perform object reference conversion, as shown in the graphic at the top of the next page. For this reason, a TNA interceptor needs access to the proxy/stub DLL of the interface it is encapsulating, or in the case of a type library–marshaled interface, to its type library. Such access is also necessary for the interceptor to handle the failure problem inherent in general interception.
One of the consequences of a user thread’s inability to join the TNA is that the thread always incurs the overhead of at least a lightweight proxy when creating or calling an object within the thread-neutral apartment. This is not necessarily true for creating or accessing an object in one of the other apartment types. If a user thread performing a watchdog activity or other periodic task needs frequent access to COM+ objects, placing these objects in the TNA could impede performance. However, such situations are relatively rare and in most architectures are confined to "system" type objects rather than objects containing business logic.
Under normal circumstances, the instantiator of an object implicitly selects the object’s apartment. But it is typical to see a client take control of in-process server concurrency by first creating a single-threaded apartment and then issuing the instantiation call from this new apartment. This approach can be effective in situations where the client process ultimately is aware of how server objects are being used, and how their concurrency can best be exploited.
The justification for switching threads involves an STA thread’s need to service a message loop somewhat frequently, and with the expectation of the MTA object developer. This justification is not so much connected to the mechanism the channel uses when making an outbound call from the MTA: normally this mechanism involves blocking the calling thread until the method invocation returns, but the channel has the power to discover a thread’s native apartment membership and can enter a message loop waiting for call return if the calling thread is an STA thread, even when making a call from an MTA object. The real problem is that the MTA programming model allows the object developer to unconditionally block threads and to do so for arbitrary periods of timeusually for synchronization purposes or when waiting to access a resource. Therefore, within MTA object implementations, you frequently will find calls to EnterCriticalSection, WaitFor.SingleObject, and WaitForMultipleObjects that have long time-outs. The STA thread must protect itself from running into code like this; it does so by waiting in a message loop at the apartment boundary and having an MTA thread block in the synchronization APIs instead.
In other situations, the server is best equipped for arranging object-to-.apartment distribution. This includes cases where server objects do not run in the client’s process space, making the client unable to affect target apartment selection. At first it might appear that the server is unable to influence this apartment selection, since COM+ makes all the choices. This impression might have been furthered by my choice of words: to simplify the discussion, I always speak of the apartment in which the object will be created. In fact, the object’s class factorynot the object itselfwill be instantiated in the apartment by the COM+ library. Therefore, it is up to the class factory to create the actual object. Normally the class factory creates the object inside its own apartment, but it does not have to. The indirection of the class factory in the object creation process gives servers the ability to take control over an STA object’s target apartment.
Visual Basic allows developers to multiplex objects, created externally or internally by means other than New, across a thread pool of a fixed size on a per-project basis. Alternatively, developers can specify that each object created in this manner should be located in a new apartment. These options are available to local servers only. Of course, C++ developers implementing their factories manually can do whatever they like to control target apartment selection or creation. But when using the Active Template Library (ATL), you can achieve multiplexing across a pool by using the macro DECLARE_CLASSFACTORY_AUTO_THREAD. By default, the size of the thread pool used by this mechanism will be four times the number of processors of the system on which your code executes. This dynamic way of determining pool size makes more sense than the Visual Basic approach, because a pool that is too large will degrade performance as much as a pool that is too small will cause insufficient object concurrency.
Before the days of MTS, COM interception was married to the concept of the apartment, and apartments are about threads. But uniting the new services with the thread infrastructure did not make sense, so a tighter execution scope for COM objects was needed. This new innermost execution scope is called the context. An object now resides within a context, and an apartment bounds a context. Extended COM+ run-time services are performed by interceptors, which must exist between any two objects that do not reside in the same context. As long as the interception occurs between contexts in the same apartment or on a TNA-bound call, these interceptors generally are as efficient as any lightweight proxy that does not require a thread switch. But recall that the interceptor still needs access to your proxy/stub DLL or type library, for the same reason a lightweight proxy requires this access. Two objects with similar configurations and the same interception needs may share the same context, but certain services require that an object be created in an entirely new context. Threading model setting permitting, an unconfigured object, which, by its very nature does not ask for and is unaware of extended services, is always co-located in its instantiator’s context.
Of particular interest is a COM+ service specifically dedicated to managing concurrency in an object. I have to admit that I was initially quite confused by this service. Having worked with the apartment model for years, the concepts of thread affinity and synchronization had become indistinguishable in my mind. But upon later reflection, I realized that while a relationship between the concepts exists, the concepts for the most part are independent. There is no reason, after all, to not serialize access to either a multithreaded or a thread-neutral object. In Essential COM (Addison-Wesley, 1998), Don Box speculated about a new apartment type he called the rental-threaded apartment (RTA). This apartment type would have behaved just like the thread-neutral apartment of COM+, but with synchronization built in. This is just the type of idea that was bound to emerge from a mindset that identifies apartments with concurrency management. Yet the decoupling of the COM thread management construct (the apartment) from the synchronization construct (contextual concurrency management) feels conceptually pure and gives us more flexibility: we now can build an object that any thread can enter (as the RTA would have allowed), but we can choose whether to synchronize method invocations (which the RTA would not have allowed).
Figure 4-1 shows all possible synchronization settings for an object. The values are identical to those available for configuring transaction support. However, instead of being enlisted in or beginning a new transaction, an object with the value Supported, Required, or Requires New participates in or begins what is known in COM+ as a synchronization domain. Under MTS, synchronization domains were known as activities and were configurable only through the method a client chose to instantiate another object. Unfortunately, it therefore was up to the client to determine whether a new object could join its activity. Under COM+, this has become transparent to the instantiator and is controlled solely by the setting in the property sheet shown in Figure 4-1.
Figure 4-1. Concurrency tab of the property sheet of a configured thread-neutral object.
Like a transaction, a synchronization domain can include objects in different contexts, apartments, processes, and hosts.4 Also, a synchronization domain is formed through the creation of an object with the setting Required (made by a caller currently outside any synchronization domain) or Requires New (made by any caller); the synchronization domain then flows to any object with the setting Supported or Required at instantiation time. In a synchronization domain, only one physical thread can execute at a time, and each thread must execute as the result of either a direct or indirect synchronous method invocation from the thread that first entered the domain. Figure 4-2 shows a synchronization domain that spans contexts and hosts with several physical threads.
Figure 4-2. Thread and synchronization domain interaction.
The fact is that objects in the single-threaded apartment, which do participate in a synchronization domain, act quite differently from objects that do not participate in a synchronization domain. These differences include the following:
As you can see, for the most part COM+ synchronization layers its benefits on top of single-threaded synchronization. Understanding this relationship certainly will be useful in your own projects.
An object with the synchronization setting Disabled is unaware of synchronization services and behaves like an unconfigured component. Notice that this setting is not the same as Not Supported: the latter ensures that an MTA or TNA object can receive calls concurrently, while the former can result in the object being located in the caller’s synchronized context. Disabled also is not the same as the Supported setting, which will force the object into a context that participates in the same synchronization domain as it would were the caller a member of a synchronization domain. But if the object requires a different context than that of the caller (for example, because the object has a different threading model, or because of other COM+ service configurations), the contexts must communicate to prevent concurrent execution. COM+ might achieve this communication by having the contexts share some type of lock. But when the setting is Disabled, the target context will not participate in such a locking scheme. This is why the Disabled setting truly has a unique meaning.
The Required and Requires New settings mean that the object must run in a synchronization domain. Requires New ensures the object always will be the root of a new synchronization domain. Required creates a new domain only if the instantiator does not already participate in one.
Synchronization Implementation by Deadlock
The theory behind context-based synchronization is fantastic, and the sheer number of options now available to developers should tremendously simplify situations that previously required you to build your own plumbing to achieve just the right concurrency behavior. However, when I examine the current COM+ implementation of the synchronization services for single-threaded objects, my enthusiasm for the technology wanes.
A call into an executing synchronization domain from a caller not participating in that synchronization domain can cause deadlock. A deadlock occurs if the call is made through an object in the synchronization domain whose threading model is Apartment and if that object’s thread is currently servicing a message loop (for example, because it is waiting for an Object RPC call return). The message loop does not need to be within the bounds of the object being accessed concurrently for the deadlock to occur. These are the deadlocked entities:
All these callers now are stalled because the message loop thread attempts to gain access to a lock on behalf of the new inbound caller. This lock never will become available because releasing it would require returning this very thread from a method invocation that is now further up the stack, as shown in Figure 43. Hence, the thread never can return from the DispatchMessage call in the message loop, and the call chain becomes stalled from that level upward.
Figure 4-3. Example of a deadlock.
This issue is mitigated by the fact that concurrency often is regulated by a layer of technology in front of the COM+ object layers in modern COM+ architectures. The sharing of object references is discouraged in such highly scalable environments. (For more on this topic, see Chapter 13.) Nevertheless, the issue begs the question of why synchronization support is even an option for STA objects when enforcing it is guaranteed to result in deadlock. It is important to understand the situation precisely, since by its very nature it likely will cause sporadic bugs under just the right timing conditions, and often will be extremely hard to debug. Until Microsoft solves this problem, the only safe thing to do is religiously avoid sharing object references to synchronized STA objects. If you must share object references, be absolutely certain that synchronous calls cannot occur in your architecture. One warning: making concessions at that level of your architecture is likely to introduce brittleness.
There is both a server and a client aspect of the message filter mechanism. Server and client applications can install their own message filter by calling CoRegisterMessageFilter, passing an object that implements the IMessageFilter interface:
IMessageFilter : public IUnknownOn the server side, COM+ will call HandleInComingCall before dispatching an inbound call to an object. On the client side, COM+ will call RetryRejectedCall when the server does not dispatch the call but flat out rejects it or advises retrying it later. The code calls MessagePending when Windows messages are received by the client while waiting for a COM+ call to return. The dwCallType parameter of HandleInComingCall lets the server know whether an incoming call is from a new causality while the object’s thread waits for an outgoing call to return (CALLTYPE_TOPLEVEL_CALLPENDING). Such calls therefore are deferred easily (return SERVERCALL_RETRYLATER).
This all sounds marvelous, but there are some limitations. First, the message filter shows its user interface orientation by allowing only local servers, not DLL components, to register a message filter. This means no configured object can use this technique and therefore excludes today’s most popular and flexible kind of COM+ component. Second, unless all clients can be guaranteed to be in process, rejecting a call with the retry flag can put an unacceptable burden on the caller. Unless the client’s message filter specifies that such a rejected call should indeed be retried, an error message immediately will be returned to the caller. This caller might not be able to set its own message filterfor example, it might have been loaded from a DLL in another process or host, or it might have been implemented in a development system that does not permit setting a message filter. It is not reasonable to expect that all clients either have a message filter that retries or performs the retry at the level of every single COM+ method invocation. A Visual Basic Standard EXE will install a message filter that does retry for some time and then uses OleUIBusy to bring up the infamous Server Busy dialog box, giving the user a chance to manipulate the user interface of the server application and thus resolve the impasse. But the default COM+ message filter does not do this, instead it propagates all retry rejections directly to the caller. Therefore, even COM+ objects implemented in Visual Basic are subject to this error when operating in a host process implemented with, say, Visual C++.
The bottom line is that message filters were designed in another era, for a problem more specific than regulating general STA object concurrency. If message filters happen to be applicable to your situation, great! By all means use themthey won’t go away any time soon. But chances are that forcing message filters into a modern architecture will be like trying to fit square pegs into round holes.
Now let’s take a look at some of the other COM+ services performed by interceptors:
Figure 4-4. Transactions tab of the property sheet of a configured object.
Figure 4-5. Security tab of the property sheet of a configured object.
Figure 4-6. Activation tab of the property sheet of a configured thread-neutral object.
An object’s threading model can have an effect on what services will be available to it. For example, single-threaded objects cannot be pooled. Interdependency also exists among certain services. For instance, supporting or requiring transactions (including new ones) forces an object to use just-in-time activation. The specific settings of Supported and Required also force a setting of Required for synchronization support. The transaction setting Requires New implies the setting Required or Requires New for synchronization support. Apart from that, enabling just-in-time activation also demands that the object require an existing or new synchronization domain, regardless of the transaction setting.
The implementation you seek here is a context-neutral onethat is, you want an object whose methods can be called from anywhere in the process, without the aid of any type of interceptor. Implementing a context-neutral object goes somewhat against the grain of the COM+ context and apartment model; in fact, it might be impossible to achieve from a high-level development environment. You can achieve this implementation using C++, although doing so is not a trivial task. Let’s see what the issues are and whether any alternatives exist.
Providing your own implementation of IMarshal neatly solves the problem of interceptor creation. After all, an interceptor is created only as the result of marshaling an interface pointer by the standard marshaling architecture of COM+. And all marshaling, whether invoked implicitly through transporting your interface pointer in a method call bound for another context or explicitly by invoking a marshaling API, eventually passes through CoMarshalInterface, which gives your object a chance to provide custom marshaling by implementing IMarshal. When you use this approach, in effect your object joins the context of its caller for the duration of the call. If you access interface pointers passed to your object, this access is made on the physical thread of the caller and through interceptors configured for your caller’s context, if any exist. If you create an object during a call, this creation occurs precisely as though performed by the actual caller.
This maneuver became so popular that Microsoft made available a canned implementation of it in the COM library, in the form of an object called the free-threaded marshaler (FTM). (This happened before the release of COM+ in Windows 2000, back in the days of Windows NT 4.) Instead of having to implement IMarshal yourself, you now can call CoCreateFreeThreadedMarshaler and aggregate the object returned from this call. It’s that simple. The ATL object wizard even includes a Free Threaded Marshaler check box that adds those few lines of code into your new context-neutral object.
Last Updated: Friday, July 6, 2001