1. 17

  2. 23

    Why is it necessary or even desirable to balance worker load? If the busy worker is still getting back to poll fast enough, whats the problem? If it were overloaded, then the other workers would pick up more connections. If anything, concentrating work in one process should aid cache hits. Is this a problem or did somebody just look at top and decide the numbers weren’t aesthetic enough?

    1. 7

      Allowing one process to stay hot vs spraying requests across every CPU seems way preferable to me too. Gives you some of the locality benefits of single threads while still allowing you to scale up. Every web server graph should look like the epoll one.

      1. 4

        The BEAM’s scheduler tries to keep one process hot, for cache hits and (IIRC) it can reduce latency by keeping a core out of the lower-power states. Also, I think it will sometimes spin a core to avoid power state transitions, at least for a short while.

      2. 1

        Someone on HN said that when you have one worker handling connections with Keep-Alive it can be bad. Because there can be a big request (or several) later on that would have to be handled by this single worker.

      3. 3

        That worker is a Stackhanovite!

        1. 1

          I like the URL title better: The sad state of Linux socket balancing

          1. 1

            Is Cloudflare really spawning a new OS thread for every connection? That seems incredibly heavyweight. Green threads are vastly lighter, and any good runtime (Haskell/Go/Erlang) will intelligently and dynamically distribute green threads across physical cores as needed (usually much faster than is possible with OS threads), as well as use an efficient epoll-like mechanism under the hood, which eliminates a lot of these considerations. Obviously they’ve thought about it, so if any CF people are here I’d love to hear the practical considerations involved.

            My guess is that it doesn’t even make sense to use SO_REUSEPORT if your listener thread can fork quickly enough. I’ll do a benchmark when I get home.

            Edit: Looks like I can do about 10,000 per second of fork/socket/connect/send “a”/recv(1) on the client and socket/bind/listen/(accept/fork/recv/send loop) on the server on my 1-physical-core 1.1GHz MacBook over localhost with Haskell’s stdlib networking stack. No optimizations, just doing whatever seems most obvious.

            1. 1

              Is Cloudflare really spawning a new OS thread for every connection? That seems incredibly heavyweight

              Goroutines seem to be around 0.5us to spawn (http://remogatto.github.io/go-benchmarks/) I don’t have modern benchmarks, but Linux pthreads from 2003, running on 2003 vintage hardware, took 20us per thread (https://lwn.net/Articles/10741/). Assuming a conservative 10% hardware speedup per year, that puts them at 6us per.

              The numbers above are, of course, highly suspect fermi calculations, and should be taken with a sack of salt, but it doesn’t seem insane to spawn lots of threads.

              Maybe one day I will look at doing some better benchmarks.

              1. 1

                I don’t see where you see the OS thread per connection. The code snippets and explanations spawn a defined amount of worker processes that listen to the same port. There’s a few fork at startup and then each process are ready to handle one connection at a time.