1. 4

  2. 3

    The way I deal with this in Scow[1] (my Raft implementation), is that Scow is parameterized over a transport. The transport has an API for how Scow will use it and every function is expected to possibly fail (and Scow has well defined ways in which it handles failure). A user can then wrap up the transport in with any policy around it that they want. So if you want to retry on failure you can wrap up a TCP transport in a retrying transport. And if you want to timeout after so long you can wrap that up in a timeouting transport, etc.

    I’ve found it quite nice since it makes testing very easy. I have an in-memory transport and I can wrap that up in a transport with generates random partitions in interesting ways.

    [1] https://github.com/orbitz/scow