Using Event -Loop Architectures to Reduce Software Life-Cycle Costs
The Problem
So many of the software problems I've come across in my years of development have stemmed from the proliferation of threads. The number of man-millenia that have been spent tracking down deadlocks, variables that should have been protected by critical sections, and all the other worms in the can of 'multi-threadedness' staggers the imagination. Couldn't all of that time have been put to better use?
Outside of academia, I was first exposed to the problem back in 1984 when I was working for a defense contractor and using a new programming language called 'Ada'. Ada was one of the first major programming languages to support multi-threading natively in the form of 'task' constructs. This was a great thing in the same way that a sharp knife is a great thing in the kitchen.
There's a saying that goes something like "when you're holding a hammer, every problem looks like a nail." Well, when you're looking to do multiple things in software at the same time, using multiple threads is frequently the implementer's choice, and often because they simply don't know of any other options.
The blame can't be entirely placed upon the developers themselves, since most educational programs didn't then and still don't emphasize alternatives to using multiple threads in programs that are required to handle multiple distinct processing sequences concurrently. In many curricula, the topic begins and ends with a class lecture and an assignment to investigate the 'Dining Philosophers' problem using threads. Going back to the knife metaphor, this is the equivalent of identifying the 'pointy end'.
One of the interesting things about the problems associated with multi-threading is that they transcend the choice of programming language / library being used, and haven't really changed in the 30+ years I've been involved with software development. The same set of problems arise whether using tasks in Ada, POSIX Threads in 'C', the (recently added) Thread support library in C++, or extending the java.lang.Thread class in Java.
One Solution
It stands to reason that if these problems are so ubiquitous, the approach to avoiding these problems may also be language-independent. The good news is that this is actually the case. One such approach is through the use of an 'event loop'.
I believe my first exposure to the event loop concept was when I was looking through the code for the X Window System server for a Unix system in the mid 80's (yes, this was before Linux was even a thing). I was intrigued because the X server was managing connections to multiple clients using a single thread. Upon investigation I discovered the select() call, which is at the heart of almost any UNIX/Linux program that implements an event loop. When the program needs to do some I/O operation or wait for some period of time, that information is encoded into the arguments of the select() call, which when invoked waits for at least one of those events to become actionable before returning control to the caller. When the select() call completes, information about which events are actionable is available to the caller, which typically services one or more of those events before invoking the select() statement again, typically withing a loop. Hence the name 'event loop'. Handling of the actionable events are typically in the context of some sort of callback via a well-defined function prototype, allowing the implementation of the event loop to be largely independent of the application logic.
A natural consequence of using an event loop is that when an event is being serviced, it must not cause the thread to block (otherwise, application liveness can be compromised, possibly to the point of full deadlock). The code servicing the event must be prepared to handle partial I/O reads and writes, for example, buffering data across possibly multiple events in order to maintain that liveness. This in turn leads to much of an application's code being implemented as state machines, with the servicing of events causing transitions within those state machines.
When One Thread Isn't Enough
Unfortunately, we can't use a single thread in every situation. I recently had to interface with a vendor's hardware device, and that vendor's library API implemented blocking behavior for all communications with the device, which meant that two additional threads (one for reading and one for writing) had to be included in the application.
How then to best insulate the rest of the program from the potential pitfalls that using threads presents? The answer comes in the form of adding a command queue to the event loop, which allows arbitrary code to be handed off to the event loop for subsequent execution on the event loop's thread.
In my case, the class for interfacing with the vendor's device hides the fact that there are threads running behind the scenes. For example, when data from the device is needed, a read method is invoked, including an argument referencing a callback method to invoke when the read operation completes. When the read operation completes (on the read thread), the command queue is used to invoke the callback on the event loop's thread. This hides the fact that threads are being used for the purpose, and allows the implementation to change when the vendor updates the API to support non-blocking I/O.
Another not-uncommon justification for using multiple threads is when a lengthy calculation (one that would impact the liveness of the program if it were performed on the event-loop thread) is necessary. This situation is similarly addressed by invoking a callback via the command queue when the calculation has completed.
The Mathematics Behind Multi-Threading
Over the years, I've had a number of discussions with architects / designers / programmers on this topic. The argument that I hear time and again is:
"But straight-line code is so much more understandable to the reader. I can read a line of input, parse that input into data, process that data and update the program state more easily than I can build a state machine to do essentially the same thing..."
The point is valid. On the surface that code is more understandable to the reader (even with the additional synchronization mechanisms necessary to deal with reentrancy). However, consider the mathematics of the situation:
Using a very simple model of a program, let's say that a program's state is entirely dependent on its program counter(s), and that any particular function takes a thousand instructions to implement.
In the single-threaded case, a program implementing one function has a thousand different program states, a program implementing two functions has two thousand, a program implementing three functions has three thousand, and so on. There is a linear relationship between the amount of code written and the run-time complexity of the program.
In the multi-threaded case (one thread per function), a program implementing one function has a thousand different program states, a program implementing two functions has a million, a program implementing three functions has a billion, and so on. There is an exponential relationship between the amount of code written and the run-time complexity of the program.
It is readily apparent that in a multi-threaded environment, a point is quickly reached where it becomes impossible to test every possible program state, or even a significant fraction thereof. This is why many multi-threaded programs will have bugs that appear only rarely and which cannot be reproduced with any kind of regularity. The cost to find and fix these bugs can be enormous.
The implications of the mathematics are obvious. Any extra time it may take to design and maintain an event-loop-based program is justifiable for a project of any significant size, if only due to the reduced run-time complexity of the program.
Summary
I've applied event-loop architecture to every piece of new development I've been involved with over the last decade, ranging from automotive software running bare on embedded microcontrollers to complex systems of cooperating Java processes running under Linux.
My personal experience has been that using an event-loop architecture rather than a multi-threaded architecture results in the following benefits:
- Reduced Development Time - This is because developers can concentrate on the essential characteristics of solving the problem at hand, rather than trying to figure out where concurrent-modification issues may arise and protecting against them.
- Reduced Testing Effort - In event-loop architectures, possible flows of control through a module are well-defined and can be tested exhaustively. No such claim can be made for modules in multi-threaded architectures.
- Reduced Integration Effort - In a multi-threaded environment, it is virtually impossible to test a module under the same conditions that it will experience in the target application, and so it is normal for problems to be uncovered during integration. Since modules can be tested exhaustively in an event loop environment, fewer problems are found during integration.
- Improved Reuse Characteristics - Software modules written for multi-threading environments often have specific conditions (typically restrictions on reentrancy) under which they can be used. Potential reuse scenarios may not adhere to those conditions. Software modules written for event-loop environments have fewer of these conditions attached and so are more portable.
In conclusion, the use of event-loop architecture results in reduced time-to-market and lower life-cycle costs, allowing effort to be focused on creating the next generation of a product, rather than fixing the previous generation.
For information about how StoutWare Engineering can help you reduce time-to-market and product costs, drop us a message via our 'Contact Us' form.