1. 15
  1.  

    1. 1

      A blackbox is an excellent thing to have from a debugging point of view.

      I think it works especially nice if your system is a deterministic state machine and you journal all inputs into the blackbox before you feed them into the state machine. That way you can easily write an application specific debugger which lets you step through your application (forwards and backwards) and see how it changed over time leading up to some crash.

      One thing you wouldn’t be able to reproduce by replaying the blackbox after a crash is the environment (i.e. CPU usage by other processes, etc), but this could be recorded periodically into the blackbox as well (as talked about in the article).

      The problem that remains is: what happens when the circular buffer wraps? We’d need to snapshot the state, but we need to be careful about when exactly otherwise a crash might happen immediately after a snapshot and a buffer wrap, leaving the blackbox empty.

      I know that at this point, the application is a lot more complicated than what’s needed in the article, but we’ve also gained: fault recovery and are also in a good position to get fault tolerance via replicating the journal. Martin Thompson’s talks and the Aeron log buffer implementation is a nice source of inspiration for this type of stuff, by the way.