'Post Modern Times'

Having spent the better part of the last year monkey-patching, mixing-in, and duck-typing my way through vast brambled acres of Ruby code, I at last threw up my hands and decided it was time for a retreat to the warm and fuzzy land of type-safe languages. The result was jl_signal, a C++ signals/slots implementation based on a system I used for a game project a while back. Source code and API documentation are available on GitHub.

This system lets you create a list of functions at runtime and invoke the entire list with a single function call. This is useful for writing GUIs and other event-driven systems where generic components can trigger all kinds of side-effects. The API has minimal syntax burdens and is easy to use, with certain caveats. It is also fast, and doesn’t allocate to the heap.

Here’s what it looks like:

// Declare and instantiate a signal for functions that take a
// single char arg.
JL_SIGNAL( char ) oKeyPressSignal;

// Two objects of unrelated type.
Piano oPiano; // plays notes
Transcriber oTranscriber; // generates text logs

// Lets assume the following instance method declarations:
//
// void Piano::PlayNote( char nKey );
// void Transcriber::LogInput( char c );
//
// Now let's connect these objects to oKeyPressSignal:
oKeyPressSignal.connect( &oPiano, &Piano::PlayNote );
oKeyPressSignal.connect( &oTranscriber, &Transcriber::LogInput );

for (;;)
{
    // Gather user input.
    char nKey = getchar();

    // Terminate if the user presses ENTER.
    if (nKey == 10)
    {
        break;
    }

    // Now let's emit the signal with the user's input. This is the
    // equivalent of making the following function calls:
    //
    //     oPiano.PlayNote( nKey );
    //     oTranscriber.LogInput( nKey );
    oKeyPressSignal.Emit( nKey );
}

Below are some remarks on motivation and implementation.

Signals/slots and the observer pattern

The purpose of signals/slots, as with all subspecies of the observer pattern, is to allow you to attach arbitrary side-effects to a given piece of application logic. Another way to say that is that it allows you to isolate (or decouple, for those of you playing a programming blog drinking game) the code that implements your application logic from the code that implements said logic’s side-effects.

For example, when you’re implementing a clickable button, you shouldn’t be concerning yourself with what happens when the button is clicked–you really only want your button code to concern itself with how to draw the button, and how to respond to input from the mouse or keyboard. Outside the button, we may have any number of observers that are interested in button clicks, perhaps a sound manager that plays sound effects, or a window handler that closes the application. It would be nice to have a way to connect any of these observers to specific button instances while keeping the button implementation as generic as possible.

With that example in mind, a signals/slots system should produce code that is roughly analogous to the following English sentences: “Buttons in my program emit a click signal. On menu X, I want to connect button Y’s click signal to the nuclear weapons manger’s LaunchEverything() function.”

Launch Button 2 3 4 5 6 7 8-- SMASH Rocket Club 5-9-09 3

Now, your typical C++ implementation of the observer pattern will use some kind of event object that gets passed from event source to event observer. Each observer class needs to overload a virtual handler method that receives event objects from sources, and runs them through some kind of dispatch table, most often consisting of a big switch statement.

class ExampleObserver : public EventObserver
{
    ...
    virtual void HandleEvent( Event e )
    {
        switch ( e.id )
        {
        case eAppEvent_Click:
            ...
            break;
        case eAppEvent_WindowClose:
            ...
            break;
        ...
        ...
        ...

This strategy is not without its benefits. For one thing, it’s reasonably straightforward to implement. For another, it makes it possible for your code to reason about events as data, so you can log them, serialize them, queue them, or whathaveyou.

But it also implies a certain syntactical burden on the API user. All of your observer classes would need to inherit from an event-receiving base class, and it’s a non-trivial sort of inheritance too, since you’d have to overload a virtual method and inject it with some kind of control structure to handle the dispatch. On top of this, you’d have to come up with some way of distinguishing between different types of events, which almost always implies the existence of a big enum of event IDs that forces you to recompile your entire codebase when you modify it in any way.

By contrast, a signals/slots system in the Qt tradition operates directly on functions. Indeed, when one speaks of “slots”, one simply means “a function that might get connected to, and subsequently invoked by, a signal object”. Once you have declared that a signal will be emitted with a certain set of parameters, you should be able to connect that signal with any function that receives those parameters. If that function happens to be an instance method, you should also be able to connect a signal to it without worrying about which class the method happens to belong to.

Containable delegates

If we can connect signals to arbitrary functions, that means we must have some means of creating lists of those functions at runtime, and a means to invoke them at a later time.

Let’s use the word delegate for an object that is able to store references to arbitrary functions of a specific signature. In the object-oriented world, this really implies two references: one to an instance method, and one to the object that scopes the method. If you “invoke” the delegate, you are asking it to call the instance method on the object.

On a mechanical level, a signal is not much more than a container of delegates. When a signal is emitted, it really just iterates through its internal list of delegates and invokes them one by one. In high-level languages, this is trivial to implement, but it gets rather hairy when you have to worry about type consistency.

In C++, it is straightforward to build lists of function pointers of the same signature, but once we have to store method-object pointer pairs, we need to find a way to get around the fact that the objects may have unrelated types. The simplest way to do this is via inheritance, plus a light dusting of template magic:

// For the purposes of this example, let's assume our slots only accept
// one parameter.
template< typename TSlotParam >
class Delegate
{
public:
    virtual ~Delegate() {}
    virtual void Invoke( TSlotParam v ) = 0;
};

template< typename TSlotParam, class TObserver >
class TypeSpecificDelegate : Delegate< TSlotParam >
{
public:
    TObserver* m_pObserver;
    void (TObserver::*m_fpSlot)( TSlotParam );

public:
    TypeSpecificDelegate(
        TObserver* pObserver,
        void (TObserver::*fpSlot)(TSlotParam) )
    : m_pObserver( pObserver )
    , m_fpSlot( fpSlot )
    {
        // do nothing
    }

    void Invoke( TSlotParam v )
    {
        pObserver->*fpSlot(v);
    }
};

template< typename TSlotParam >
class Signal
{
    typedef Delegate< TSlotParam > TDelegate;
    typedef std::vector< TDelegate* > DelegateList;

    Delegate m_oDelegates;

public:
    template< class TObserver, class TObserverDerived >
    void Connect(
        TObserverDerived* pObserverDerived,
        void (TObserver::*fpSlot)(TSlotParam) )
    {
        // This gives us the correct pointer in cases where
        // multiple inheritance is involved.
        TObserver* p = static_cast< TObserver* >( pObserverDerived );

        // Add our delegate to the list
        m_oDelegates.push_back(
            new TypeSpecificDelegate<TSlotParam, TObserver>(p, fpSlot)
        );
    }

    void Emit( TSlotParam v )
    {
        for ( DelegateList::iterator i = m_oDelegates.begin();
              i != m_oDelegates.end();
              ++i )
        {
            i->Invoke( v );
        }
    }
};

This has the essence of a fairly standard C++ “functor” implementation: we hide the heterogeneity of the observer types using an abstract base class, and use a virtual “thunk” method for each observer type to handle the actual invocation.

Tangentially, it does seem as if C++11’s lambda would obviate this kind of trick, although I’m not sure whether lambda objects implement enough operators (e.g., can they be compared?) to be inserted into standard library containers, to say nothing of whether your CTO feels like C++11 is mature enough for production code.

Removing virtual functions and inheritance from delegates

While I haven’t always been of this mindset, I believe that if you can limit your code to the above level of complexity, then by all means do so. Nevertheless, some might balk at the use of inheritance here, which tends to produce certain undesirable consequences at runtime, including allocating to the heap and virtual function dispatch. As it turns out, numerous ingenious approaches (summarized nicely in this StackOverflow thread) have been proposed to deal with these concerns, which seems, on the one hand, to be a testament to C++’s impressive (some would say byzantine) flexibility, but perhaps also a sign that something like this really should just be part of the language.

Uploaded on September 7, 2008 by 9 0 0 0

By far the most famous of these approaches is Don Clugston’s FastDelegate library, which was publicized in 2005 and remains, for my money, the best candidate for the job. Clugston’s implementation is based on the observation that instance method calls all look basically the same at the assembly level, where types don’t matter, and engages in some standard-violating thuggery to coerce member function pointers from disparate classes into living within the same delegate type. Invoking these delegates is as fast as any non-optimized function call.

As an added bonus, indeed one that basically no other delegate library seems to offer (including, apparently, C++11 lambdas), Clugston’s delegates are comparable by value, which implies that they are practical for use with container classes. To see why comparability might be important, imagine implementing signals with a delegate library that didn’t let you compare delegates. How would you disconnect a particular method-object pointer pair if you couldn’t compare it with the delegates already stored in the signal?

Automatic disconnection

If you’re with me so far, then you will recognize that signals are basically just big lists of pointers. This is a rather serious concern if your signals connect to observers that are deleted over the course of program execution. How can we prevent the pointers cached inside signal objects from dangling?

You could start, as I did, by simply ignoring the problem. Within the observer pattern, there are two explicit parties, the observers and the observed, and a third implicit party, which is whatever or whoever actually does the connecting between the observers and the observed. At a glance, it seems reasonable to enforce a Boy Scout Campsite Principle of sorts (i.e., leave the campsite as clean or cleaner than when you got to it), and place the responsibility of disconnection on the code that created the connections in the first place.

// The "Boy Scout" principle of signal disconnection:
// clean up your own mess!
class ExampleContext
{
    void Init()
    {
        m_pButton = new Button();
        m_pWindow = new Window();
        m_pButton->ClickSignal()->Connect( m_pWindow, &Window::Open );
    }

    void Deinit()
    {
        m_pButton->ClickSignal()->Disconnect( m_pWindow, &Window::Open );
        delete m_pWindow; m_pWindow = 0;
        delete m_pButton; m_pButton = 0;
    }
}

In practice, this quickly leads to an untenable mess. Having to accompany every delete statement with additional cleanup is the stuff maintanence nightmares are made of. What’s more is that unless you’re exceptionally careful about your abstractions, the system you implement may need to delete your observer objects outside the context that created the signal/slot connections, which means you lose any parallel niceness between the the code that connects and the code that disconnects.

On an aesthetic level, too, you will note that a lot of your disconnection code looks the same: you list out the connections an observer has made and disconnect them one by one. If you do that enough, you start to wonder if you couldn’t just put a list of connections into the observer itself, and process that list in the observer’s destructor.

What this leads to is a common base class for observers, something which I’d initially tried to avoid, but seemed necessary if I really wanted to enforce certain bookkeeping rules upon my observer objects. Fortunately, since there are no virtual functions to overload, it’s a relatively unobtrusive form of inheritance. We just inherit from the observer base class and then politely pretend like it never happened.

Typeset as image/diagram

Simplifying the C++ syntax

Signal classes in jl_signal are instances of signal template classes. There is one template class for every arity value (argument count) up to 8. So the declaration of these template classes look something like the following:

template< typename P1 >
class Signal1 { .... };

template< typename P1, typename P2 >
class Signal2 { ... };

...

template< typename P1, typename P2, typename P3, ..., template P8 >
class Signal8 { ... };

Note how each template class embeds its arity value in its name. From the perspective of someone using the API, it seems redundant and tedious to specify the arity in the typename; after all, you will immediately follow this with a list of types whose size is exactly that arity. Doesn’t it seem like their should be a way for the compiler to deduce this automatically?

C++ does not allow you to do this directly. To wit, you cannot do the following:

template< typename P1 >
class Signal { ... };

template< typename P1, typename P2 >
class Signal { ... };

typedef Signal< int > SignalInt;
typedef Signal< int, char > SignalIntChar;

The reasons why this is disallowed are arcane. At first glance, it seems like the compiler ought to have no problem disambiguating between specializations of either template class. And still, excepting C++11’s variadic templates, there is no straightforward way of arbitrarily expanding the number of arguments in a template specialization. What you are left with, instead, are tricks of varying levels of sneakiness and absurdity. (When your appetite for template magic is sufficiently large, why not try this head-exploder on for size?)

In jl_signal, I settled on the same strategy used by FastDelegate to simplify its own class naming. This strategy exploits the fact that arbitrary function types count as single template parameters.

You don’t lose points here for not knowing what a “function type” is; I’d never heard of, much less used, function types until implementing this library. It was once explained to me that function types were simply the non-pointer counterpart to function pointer types, just as int is the non-pointer counterpart to int*. Since C and C++ do not offer first-class functions, the function type seems to me like the computational analog of the mythical Yeti, or perhaps the sound of one hand clapping.

Regardless of their existential dubiousness, function types allow you to grossly manipulate template syntax when you combine them with some trivial inheritance. We remove arity from signal typenames thusly:

// Forward-declare an empty template class
template< typename SomeFunctionType >
class Signal;

// Specialize the above with a function type and trivially inherit
// from the signal class with the arity in the typename.
template< typename ReturnType, typename P1, typename P2 >
class Signal< ReturnType(P1, P2) > : public Signal2< P1, P2 >
{
    // Nothing required in here. But since this is not exactly
    // the same type as its parent, you might want to implement
    // copy constructors that can translate the parent type to
    // the derived type, and vice-versa.
};

Note that for our immediate purposes purposes, the ReturnType parameter is unused. It still has to be included in declarations, which results in such verbosely unsavory constructions as the following:

typedef jl::Signal< void(int) > SignalInt;
typedef jl::Signal< void(int, char) > SignalIntChar;

In order to obscure the ReturnType component, I resorted to a macro. Regardless of one’s distaste for the preprocessor, I imagine most people would find the results more palatable:

JL_SIGNAL() oSignalThatEmitsNoParams;
JL_SIGNAL( int ) oSignalThatEmitsInt;
JL_SIGNAL( const char* const* ) oEmitsConstArraysOfConstStrings;
JL_SIGNAL( unsigned, double, size_t, char**& ) oSignalThatIsComplicated;
JL_SIGNAL( JL_SIGNAL()& ) oSignalThatIsMeta;

Anyway, while we’ve managed to simplify life for the API user, things are unavoidably ugly for the API implementer, who is still burdened with the task of populating a header file with nine nearly-identical template classes, one for each arity (again, C++11 notwithstanding). At first I used the C preprocessor for this, which predictably made debugging a living hell, and after a while regressed to expanding the macros and manually propagating any subsequent change nine times over. This tactic proved to be as stultifying as it was bug-prone. Shortly thereafter I achieved a Zen-like state of madness and actually wrote an entire macro language in order to produce this one header, the embarrassing results of which have long since been scattered into the digital ether.

Finally, when tidying up the library for this article, I settled on using ERB, which, owing to a syntax that looks nothing like C++ code, turned out to be a pretty pleasant experience. For what it’s worth, I’m hardly alone when it comes to building custom preprocessors for tediously repetitive code. Clugston more or less did the same with FastDelegate.

Eliminating heap allocations

jl_signal’s original use case was for building user interfaces in console games, a particularly resource-limited category of software whose practitioners tend to disdain heap-based dynamic memory allocation. This is not entirely without reason. For one thing, it can be pretty slow, but more than this, it risks inefficient memory use due to fragmentation.

jl_signal, with to its penchant for linked lists, poses a serious fragmentation risk. Each connection allocates a couple of tiny objects (three to five int’s worth of space) that will absolutely shred your contiguous free memory if you’re not careful.

One strategy for containing the fragmentation issue is to subdivide your free store into multiple purpose-built heaps. Instead of using the system’s malloc(), you can create you own heaps that offer up memory from their own internal buffers, and then use a placement new to create your objects. This provides some structural guarantee that allocations in different heaps won’t interfere with each other.

Fruit buckets

Alternatively, if you know something ahead of time about the size of allocations you’re going to make, you can use fixed-size block allocators, which just might be the holy grail of performance-sensitive dynamic memory allocation. Pretty much any performance- or fragmentation-related concern associated with memory allocation arises from the fact that the runtime has to deal with allocations of different sizes. If your application can be made to consume allocations of identical size, then you have categorically eliminated the fragmentation issue, and you have reduced allocator operations down to what amounts to simple linked list manipulation.

Luckily, jl_signal only makes two different kinds of allocations, each of an exact size. So long as we can make a hard estimate regarding the maximum number of connections our program will make, we can go ahead and reserve a fixed chunk of memory for the entire API at program initialization and call it a day. Assuming this would be a good thing for most applications, I went ahead and packaged the library with jl::ObjectPool, a simple fixed-size block allocator, and jl::DoublyLinkedList, a linked list class that can be configured to use custom allocators at runtime.

On a tangential note, it’s interesting to explore how a fixed-size block allocator would work without employing dynamic memory allocation itself. After all, if the allocator is ultimately a list of free memory chunks, then where do the nodes of this list get allocated? The solution, which I gamely stole from Boost and will take no credit for, is simply to cannibalize the free memory chunks themselves, using each to store a pointer to the next free chunk. A minimal allocator implementation would thus only require single pointer, referencing the head of the free list.

Diagram of a block allocator's free list

For what it’s worth, this kind of memory re-use seems about as close as you can get to a Platonic use case for union or reinterpret_cast, C/C++ language devices that are otherwise frequent accessories to all sorts of type-unsafe hackery and mayhem.

Catastrophic delegate behavior, or why you might not want to use this API

jl_signal does nothing fancy when you call Emit() on signal objects. Under the hood, you have a doubly-linked list, traversed by your standard iterator that follows next pointers. This is untroubling until you consider code such as this:

JL_SIGNAL() oSignal;

void f1() { oSignal.Disconnect( &f1 ); }
void f2() { /* do something */ }

oSignal.Connect( &f1 );
oSignal.Connect( &f2 );

oSignal.Emit();

Calling Disconnect() on a signal while the same signal is still processing an Emit() certainly sounds like trouble. The current implementation of jl_signal does not handle this well at all: as soon as a slot is disconnected, the internal delegate node is freed and the iterator is left with a dangling pointer.

// A sample Emit() method on a signal object for slots that take
// two arguments.
void Emit( TArg1 a1, TArg2 a2 ) const
{
    // Here, the iterator is just a pointer to a list node.
    // Incrementing the iterator means that we just follow the node's
    // "next" pointer. But what happens if the current node is freed
    // before the iterator can be incremented?
    for ( ConnectionListIter i = m_oConnections.begin();
          i.isValid();
          ++i )
    {
        i->delegate.invoke( a1, a2 );
    }

    // I also considered the following design, which solves the
    // above problem by moving it to a different location.
    // What if, by invoking the delegate, we disconnect the NEXT
    // delegate? We're still left with a corrupt iterator.
    ConnectionListIter i = m_oConnections.begin();
    while ( i.isValid() )
    {
        ConnectionNode* c = *i;
        ++i;
        c->delegate.invoke( a1, a2 );
    }
}

One can imagine all sorts of enormously gnarly hacks to compensate for the iteration problem, e.g. a signal that tracks its own iterators and notifies them when they get modified (ugh), or maybe an iteration mechanism that chooses its next delegate based on the set-difference between the complete list of delegates and the list of delegates already invoked (blech). I sort of suspect that there is no aesthetically-pleasing solution.

This problem seems to speak towards a general weakness not only in the jl_signal library, but also in the signals/slots idiom and perhaps even the observer pattern at large. The general mechanic exists to obscure secondary side-effects at the source level, which is very useful, but the same obscurement can make it extremely difficult to see when those side-effects could potentially interfere with the mechanic itself.

Supersonic B3

After a cursory review of other similar (read: relatively bespoke and not widely-used) libraries, I gather that most implementers just choose to ignore the issue. Even in cases of more battle-tested APIs, this problem doesn’t seem to be resolved with anything resembling intuitive cleanliness. For example, jQuery appears to copy event callbacks into an immutable list immediately before invoking them, such that a single event broadcast will invoke all attached callbacks, regardless of whether later callbacks may be disconnected as a result of earlier callbacks. As for Qt, I’m sure the problem is alleviated somewhat by the fact that all participating classes are subsumed under a hermetically-sealed object system, but the literature suggests that it’s not a trivial problem, even for the API that champions the signals/slots paradigm.