wtorek, 9 października 2012

Handler mechanisms: design patterns

What really is a handler? What is main handling mechanism? These two questions appear on my mind when I start writing this post. Let me explain meaning of these words before I start, so no misunderstanding can actually happen. A handler is a procedure (stored as a function, method, or functor) that perform individual task according to given parameters. Usually there are a lot of handlers that accept same set of arguments and one main handling mechanism that decides which handler should handle particular "request". Main handling mechanism is simply a procedure that stores references to all registered handlers and executes respective one on demand. Handling mechanism should hide all unnecessary details from the end-user as he does not care how things are really handled. In this post I'll try to depict designs that can be used in order to implement such a "design pattern".

We start with C. The first thing that comes into mind is switch-case expression. Let's see how actually code looks like by defining three different handler functions and one controller.
int handlerA(int param) { return param + 1; }
int handlerB(int param) { return param + 2; }
int handlerC(int param) { return param + 3; }

int handle(int code, int param)
{
    switch(code)
    {
        case 0: return handlerA(param);
        case 1: return handlerB(param);
        case 2: return handlerC(param);
        default: assert(!"No handler registered for provided code");
    }
}
Nothing really interesting happens here. When new handlers are needed, the switch statement must also be modified. On the other hand invocation time when running program is really minimal (everything is computed at compilation-time). This approach may be good for small projects that have small amount of handlers.

Moving forward we can discover another approach - array of function pointers.
int handle(int code, int param)
{
    static int (*handlers[])(int) = { handlerA, handlerB, handlerC };
    static const int numHandlers = sizeof(handlers) / sizeof(handlers[0]);
    assert(code > 0 && code < numHandlers && "No handler registered for provided code");
    return handlers[code](param);
}
In terms of performance there is no meaningful difference between these two approaches. What we gain is less characters typed on the keyboard and more concise (in my opinion) design. Note that we have to add strict bound checking to avoid SIGSEGVs or other sort of problems.

We can easily redesign function array approach to use methods. The code would be pretty similar, so I won't write it down here. However, to improve our design we can use boost library as shown below.
class Handlers
{
public:
    int HandlerA(int param) { return param + 1; }
    int HandlerB(int param) { return param + 2; }
    int HandlerC(int param) { return param + 3; }
};

int main(int ac, char ** av)
{
    Handlers handlers;
    std::vector<function<int(int)> > fs;
    fs.push_back(bind(&Handlers::HandlerA, ref(handlers), _1));
    fs.push_back(bind(&Handlers::HandlerB, ref(handlers), _1));
    fs.push_back(bind(&Handlers::HandlerC, ref(handlers), _1));
    ...
}
For small project this approach could be inefficient: implementation requires additional dependencies. In bigger projects that already use boost libraries (what is common), this is not a big deal.

boost::bind documentation says that not using boost::ref results in pass-by-value behavior. I tried that and noticed that single bind produces eight calls to copy constructor! Maybe my opinion could be considered as premature optimization, but I advise to use boost::ref anyway.

In all previous approaches there were no possibility to add handlers at run-time. Eventually we got to this place. On the other hand, performance is decreased a little bit.

Okay, I have just mentioned that new handlers can be "registered" in run-time. However it's hard (at least for me) to imagine in what kind of situation this could be helpful. Maybe you do have any ideas? In the meantime let's concentrate on more practical approaches.

Imagine set of of handlers that register themselves automatically within the main handling class. Adding new handler doesn't require any changes in main handling mechanism. Moreover, there is no central point, where all handlers are defined, so we achieve "distributed" system. Usually when we solve this kind of problems we end up with something similar to code showed below.
class IHandler
{
public:
    virtual void Handle(void * data) = 0;
};

class HandlerA : public IHandler
{
public:
    virtual void Handle(void * data)
    {
        // Handle data here...
    }
};

class MainHandler
{
public:
    void Handle(int code, void * data)
    {
        _handlers[code]->Handle(data);
    }

    void RegisterHandler(int code, IHandler * handler)
    {
        _handlers.insert(make_pair<int, IHandler *>(code, handler));
    }

private:
    map<int, IHandler *> _handlers;
};
This approach forces developer to add somewhere a code that will register his class to the central point. How it can be done differently? Take a look at following snippet.
class IHandler
{
public:
    virtual void Handle(void * data) = 0;
};

class MainHandler
{
public:
    static void Handle(int code, void * data)
    {
        _handlers[code]->Handle(data);
    }

    static void RegisterHandler(int code, IHandler * handler)
    {
        _handlers.insert(make_pair<int, IHandler *>(code, handler));
    }

private:
    static map<int, IHandler *> _handlers;
};
map<int, IHandler *> MainHandler::_handlers;

template <typename T>
class AutoRegister
{
public:
    AutoRegister()
    {
        MainHandler::RegisterHandler(T::HANDLED_CODE, new T());
    }
    void fun() const { return; }
    ~AutoRegister()
    {
        // OPT: Unregister here...
    }
};

class HandlerA : public IHandler
{
public:
    virtual void Handle(void * data)
    {
        // Handle data here...
    }

    static int HANDLED_CODE;

private:
    static AutoRegister<HandlerA> _autoRegisterer;
};
int HandlerA::HANDLED_CODE = 1;
AutoRegister<HandlerA> HandlerA::_autoRegisterer;

// Example usage:
int main(int ac, char ** av)
{
    MainHandler::Handle(1, NULL);
}
As you can see there is no registration code inside a class that is a handler. Instead, we have to put two additional fields into the class that will automatically register a class in which they are defined to the central handling mechanism. In other words registration code may be unified and developer can just add few lines in order to define new handler. We can even employ CRTP pattern here, so defining new handler is as simple as just deriving from other type, but I'd prefer to stop things before they become uncontrollable. We must keep in mind that static initialization as shown above has its own problems. Arseny Kapoulkine has written a great post about dangers that come with this approach -"Death by static initialization" and I advise to read this post before implementing anything.

There are variety of design patterns related to handling mechanisms. Choosing right one strongly depends on multiple conditions. Switch and array based solutions have that advantage that they are thread-safe and relatively fast. When handlers are supposed to be added in run-time we can pick up some OO-based solution with e.g. vector. For convenience some auto-registering functionality may be used, but used with extremely care.

At the end of this post I'd like to compare C++ solutions with one python implementation - just to see dynamically-typed language in action. Take a look at following three code snippets: handler interface, example handler, and main handling mechanism respectively.
class BaseHandler(object):
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def Handle(self, data): pass

    @abc.abstractproperty
    def code(self): pass
class ConcreteHandler(BaseHandler):
    code = 1
    def Handle(self, data):
        pass # Handle here...
class MainHandler(object):
    def __init__(self):
        self.__handlers = dict([
            (x[1].code, x[1]())
            for x in inspect.getmembers(sys.modules[__name__])
            if (issubclass(x[1], BaseHandler) and x[1] != BaseHandler)
        ])

    def Handle(self, code, data):
        try:
            return self.__handlers[code].Handle(data)
        except KeyError:
            raise RuntimeError('No handler registered for code ' + str(code))
That magic is, however, natual consequence of dynamically-typed language such as python which provides, in my opinion, great balance of power and readiness :-).

Natural continuation of this post should be review of possibilities in C++11. Do you have any remarks in this area? I'm curious how much it can help in such areas. Maybe it can't improve design very much, but I wouldn't be surprised in case of some killer C++11 feature, though. Share your ideas in comments!