1. 14

  2. 3

    I wanted to ask the rationale behind the jitter, but I googled a bit. If you have a large number of clients connecting at roughly the same time and they wait the same amount of time before trying again, of course all clients will try again at roughly the same time, so you will probably have the same problem again. In the article the perspective is mostly from a single client so this didn’t directly come to mind, but it seems super obvious now.

    1. 2

      This is known as the Thundering Herd problem.


      1. 1

        Coincidentally, I’ve read about this phenomenon a couple of times in the past few days.

    2. 1

      I remember, about 18 years ago, finding the following comment before the definitions of the factor and jitter constants in Twisted’s ReconnectinClientFactory:

          # Note: These highly sensitive factors have been precisely measured by
          # the National Institute of Science and Technology.  Take extreme care
          # in altering them, or you may damage your Internet!

      That comment’s still there. No, I didn’t quote it from memory.

      I just wish there had also been a warning in there about the maxDelay constant. But then, even having read this article, I think maybe a maxDelay of 3600 seconds (1 hour) is too much. I remember first shortening that constant in a project about 16 years ago, where the connection wasn’t between two services, but between a user’s home PC and a server, in a product that allowed users to access their home PCs remotely. I was pretty sure that a maximum delay of an hour was too much in that case.

      But then a few years later, in a project where I didn’t use Twisted, I screwed up the retry behavior in a more prosaic scenario where a desktop app was trying to send an HTTP request to the server. (Yeah, singular server; no load balancer yet.) I don’t think I even did exponential backoff, let alone with jitter. I’m pretty sure that screw-up was at least partly responsible for having to move to higher-end hosting a few months after that product was released.

      1. 2

        Yeah, that comment is not obviously enough a joke, given its attachment to one of the most twisty bits of code in the code base. These days my first recourse when faced with use of ReconnectingClientFactory is to rewrite it as a loop that uses eendpoints.This invariably fixes more bugs than it creates.

        The newer backoffPolicy has less jokey defaults.

      2. 1

        With today’s modern technology, any kid can put a quick mock-up together with Django, React, and MongoDB to store recipes and retrieve them by various attributes.

        I think the author is the same person who writes job openings: “We are looking for a starter with at least 3 years of experience with Django, React, MongoDB, microservices, devops, UX design, and MVC frameworks. You should also have experience with payment systems and be super passionate about dogfood.”.

        1. 1

          hmmm, dog food!