1. 9
  1.  

  2. 5

    The OOM killer is, IMNSHO, broken as designed. Track how much memory is available, return NULL, let the application deal with it then, when it can still be dealt with, instead of killing a random (I know, not really random) process later. I disable the OOM killer whenever feasible.

    1. 2

      In practice though, C++ throws, Rust panics, I think only well-written C code would have a chance of behaving ‘correctly’ in this case? And that’s the kind of low-level process that’s unlikely to be selected by the OOM killer.

      So effectively, letting the application deal with it equals letting the application crash. The application that runs into this situation can be whatever application happens to need an allocation at some point. That seems more random than what the OOM killer targets?

      1. 4

        That’s not the OS’s decision to make, though. With the OOM killer enabled, C/C++ doesn’t have the option to handle it differently. If Rust or Go ever wants to change how they handle allocation failure in the future, they can’t if the OOM killer is enabled. It’s too strong of a policy decision for such low-level features as allocation and process lifetime.

        (Of course, I haven’t written a kernel used by billions, so it’s easy for me to judge.)

        1. 3

          Sounds to me like a good opportunity for an opt-in flag asserting that a particular binary handles allocation failures gracefully, so return NULLs to them when appropriate; else deal with it via the OOM killer.

        2. 2

          If there were capacity planning done and limits set on processes or process groups, the ones violating their own capacity would be the ones degraded.

          1. 1

            OpenVMS used process limits for that reason. Plus accounting purposes like link says. Then, they had both virtualized kernels and clustering to mitigate that level of failure.