1. 7

    I personally find simple TCP proxies great for failure testing. E.g. when you want to know how your application behaves when the connection interrupts suddenly, blackhole situations, timeouts, malformed data, etc. It’s often quite hard to test the failure scenarios in any other way. I’ve never found a ready-to-use solution that works perfectly for all possible cases. Usually, it’s much simpler to write your own very quickly in your favourite language and make it do what you want to do.

    1. 4

      This is a unique type of post. It sparked a whole new stream of language and library feature development. One of the most notable is the design shift of Kotlin coroutines which was well described by one of the lead language designers.

      1. 2

        A very interesting paper! Another example of a metastable failure could be a recent Cloudflare outage where a partial network partitioning caused a repeatedly failing RAFT election cycle which then caused a prolonged degraded state due to a replica rebuild, as described in Michael Pigott on Toward a Generic Fault Tolerance Technique.