It seems reasonable to me that you would default to using whatever resources are available, so I hope this proposal gets the “go” ahead.
The rule of thumb I’ve usually used is “#num cpus * 2” for threads, to improve the chances that any thread which pauses allows other threads to fill in the CPU utilisation.
Is that not necessary here because the go runtime is mapping goros to OS threads to explicitly soak up any pauses for I/O?
Yes. When GOMAXPROCS=1 was the default, there was no parallelism but there was still plenty of concurrency.
Sorry, I meant I was wondering why the new default was just “num cpus” not “num cpus * 2”, the latter being a rule of thumb to ensure that there are enough threads to hand to max the cpus.
Rule of thumb where? In Java a lot of async code is really hiding the waiting in a thread somewhere. In Go and Erlang the runtime can actually do something else while waiting so more schedulars than cores can oversubscribe the system.
Usually when this happens in java, it’s because you actually need to do work on a CPU. Presumably, for the same problem, you would still need to do that work on a CPU in Go or Erlang. I’m not sure why there would be a difference other than semantically (unless someone decided to have a thread just block on an async call, which just sounds like a bug to me).
Usually when this happens in java, it’s because you actually need to do work on a CPU. Presumably, for the same problem, you would still need to do that work on a CPU in Go or Erlang.
But, Go and Erlang are actually utilizing non-blocking I/O primitives, and so they can preempt a computation and switch back to the I/O task. Essentially, you should be able to peg the CPU completely with CPU bound computations while waiting for I/O.
Java can utilize Non blocking I/O, but in my experience, code often doesn’t. It instead utilizes ThreadPools (Executors), does blocking I/O, and provides an async interface (Futures, Listeners, etc) to the work completing.
I feel like most folks I’ve talked to who are doing serious work where they care about performance are using netty, which is designed from the ground up to be non-blocking.
Perhaps I’m biased though, because I work on a library for the JVM that doesn’t do blocking I/O.
My experience is that num CPUs + 2 results in better coverage. For more CPUs, *2 just results in a lot of thrashing.
Have there been any tests on whether “#num cpus * 2” is better than “num cpus”? Is this from your personal experience? I admit it makes sense logically.
I’ve seen similar results for one workload, but I’m not sure I agree with it as a rule of thumb in general. As with any configuration setting, you should run your own experiments to see what works for you.