1. 16
  1.  

  2. 4

    Re-learned this the hard way two weeks ago. Default timeout of my TCP was apparently > 2 minutes, and I had a timer running every 30 seconds. So a) I’ve got a hang for > 2 minutes on the scheduler b) afterwards all blocked jobs got run at once, leading to a DDoS block.. Luckily found this during testing.

    1. 2

      All very true, yet at the same time it’s often hard to know what the timeout should be, which is probably why it’s left to infinity by default. And the reason is the exact argument he’s using: the network is not reliable.

      So let’s say you set it to 20 sec which is plenty of time as the call finishes in 1 second on your machine. Now your app is used in a country with a very slow connection, and it becomes unusable because the calls keep timing out. They would take 30, 40 seconds, which is always higher than your limit.

      So it’s hard to get it right so that it works everywhere on all devices.

      1. 3

        I don’t think timeouts should be set based on what’s reasonable given the technical equipment at hand. Timeouts should be set based on user expectation for the job being performed. Action video game? When you’re getting into hundreds of milliseconds it’s already too much. Interactive business application? No more than a few seconds. Batch jobs? Depends on their size, but could well be days.

        “But kqr, be realistic. Should people in countries with slow connections not get to use interactive business applications?”

        Sure, they should. But they have to be written in a way that works under the technical limitations where they are to be used. If you write an application to be used somewhere with a slow connection, maybe don’t rely on the connection for interactivity at all? Solve the problem sustainably: do more local processing. Don’t just increase the timeouts and have your users suffer.

        1. 1

          All very true, yet at the same time it’s often hard to know what the timeout should be, which is probably why it’s left to infinity by default. And the reason is the exact argument he’s using: the network is not reliable.

          Definitely true, but I think it would be nice if the libraries didn’t let us shoot ourselves in the foot. You also have to think, for example, about those libraries that are used between services.

        2. 1

          If you follow this advice… Always write unit tests that test you handle a time out sanely.

          I have seen sooo much code complicated by timeout logic…. and does nothing more useful than if the user had ctrl-c’d it.

          In fact less useful.

          If I set something to do it’s thing over night…

          …I’m going to be pissed off if you got impatient with a small temporary net outage and I find the app time’d out and did nothing overnight.

          Conversely if I know the ‘net connection is dead (hey I tripped over the cable and the router is smashed, it’s dead I know it) I’ll be fed up and angry that I can’t get my desktop back until you time out 10 minutes later and furious if you don’t just let me ctrl-c and carry on with my life.

          ps: The poster child for this was NFS network filesystems in The Bad Old Days (or badly configured ones these days)….

          If the server went down, or a ’net connection died, no matter how permanently…

          …any process that accessed any file or directory on the mounted system, hung, unkillable in ‘D’ mode until you rebooted the system.

          (You will be amazed at how many directories a shell session will stat or open while it does it’s thing. If just one of those is a dead NFS mount….)

          1. 1

            In the browser it’s not infinity, it’s whatever the browser/OS timeout at, which is… hard to test.

            I was working on on a offline web app (PWA) and the original author didn’t set timeouts in anything. Ugh.

            If love to find out the exact algorithm chrome (on windows) uses for timing out requests. Testing the same code on same browser would result in different timeout values.

            1. 1

              A good API should be easy to use the right way and hard to use the wrong way.

              Yes, but I would say “the right way must also be the easiest” – quote stolen from J. Pakkanen

              I think of it as a trap: The easy way is something people either “fall into” – bad defaults – or can’t rationally avoid because they don’t feel 110% sure about the difference. There must be no such trap.