1. 14
  1. 6

    That’s a good reminder that once you start dealing with Promises and async/await, unexpected things can happen between lines of code if you’re not mindful.

    1. 5

      yep, it’s really easy to find yourself with “stale state” if you’re not careful, especially when working with closures and the like.

      function doThing(data, baz){
         let result = await doAPICall(data.foo);
         data.bar = result + baz;
      }
      

      stuff like this gets really messy (and hard to catch when your action is supposedly idempotent!) because your operation depends not only on an async operation, but on some captured state from the initial function call, so your doThing(data, 'a'); doThing(data, 'b') might end up completely breaking supposed data invariants on your structure.

      (we had a nasty bug like this with a dropdown with fuzzy searching + autocomplete, where you could really easily end up with stale results, I wrote a bit about it a couple years back).

    2. 4

      I remember hearing that, despite ATMs/banks being the “canonical” example of race conditions being Very Bad(TM), actual ATMs are totally susceptible to race conditions and there’s just batch reconciliation processes to handle stuff instead (and, I imagine, an expectation that stuff “higher in the stack” like laws, security cameras, etc will handle issues).

      I have no idea if this is true but I choose to believe it

      1. 3

        It used to be true and I know of real stories of good thugs hunting down bad thugs after exploiting the issue, but that was back in the early nineties.

        It is much less of a problem at the era of ubiquitous connectivity, but of course, there is some slack in the name of usability and that is handled “higher in the stack”, indeed.

      2. 2

        This is not a node-specific problem, stale state is a matter of architecture. But yes, you can definitely get these and a lot more quirky problems with the event-loop.

        1. 2

          This is exactly the kind of problem where Weeks of Debugging Can Save You Hours of TLA+ and I would even paraphrase it as “Weeks of Sprinkling Mutexes Around the Code Base Can Save You Hours of TLA+”.

          In general, the problem described in the article is of categories “lost update” or TOCTOU (time-of-check-time-of-use) and should be solved not by adding mutexes, but by redesigning your data structures and operations on them in such a way that they are consistent in the face of all possible event reorderings. Adding mutexes to business logic code should always be considered a dangerous and suboptimal approach.

          In the current case, the first step would be replacing “saving” of full balance by incrementing it (not unlike the SQL example brought in the article itself).