1. 46
  1. 13

    TLDR: If you assign cpu quotas to containers and you give them all the cores they ask for (thread pools), they’ll immediately get throttled due to the overall cpu time consumed by that. Instead give them only as many cores as is possible without throttling, so tail latency stays ok and you don’t overload node after node.

    1. 6

      At $DAYJOB our advice has been “set GOMAXPROC to some value close to the amount of CPU you’re asking for” (usually the same, since Go doesn’t count io-blocked goroutines). Otherwise teams would ask for 1 or 2 CPUs, then fall over in latency due to having several times more goroutines.

      1. 3

        Plus that it took kernel patches to achieve that, and a lot of other investigation worth a read, IMO.

        1. 3

          investigation worth a read

          True, but the article is really long with a lot of “what ifs” so you can get lost.

      2. 4

        In my experience, the best solution for latency-sensitive workloads is to use pinned CPU cores (the “CPU Pinning and Isolation” approach). If there’s 64 cores on a machine, and your distributed scheduler (Mesos, Kubernetes, …) placed 63 cores worth of workload onto it, then there isn’t any need for CFS to get involved. Performance is nicely predictable regardless of traffic patterns.

        Oversubscription is something you want to do only for latency-insensitive workloads, which implies that (1) users have to tell you whether their processes care about latency or not, and (2) it must be possible to selectively evict workloads from an overloaded machine without causing the latency-sensitive workloads to notice.

        Kubernetes (as noted) has some built-in support for this, which is activated by a somewhat arcane selection mechanism (IIRC set the “reserved cores” and “core limit” to the same value, which must be an integer).

        1. 4

          Honestly it sounds like the main problem here is the scheduler. I’m not saying you should run many massive threadpools, but at the end of the day if you have a latency-sensitive service that isn’t being given CPU for seconds at a time your scheduler isn’t suited for a latency-sensitive service.

          Bursting is good. You are using resources that would otherwise be idle. It sounds here like the scheduler is punishing the task for the scheduler’s mistake. CFS is ensuring that the job gets N cores on average what you actually want is the scheduler to ensure that the job gets N cores minimum.

          So while having too many threads laying around is slight unnecessary pressure on the scheduler and wasted memory I don’t think it should be causing these huge latency spikes.