I personally think of auto-scaling as a bit of a minefield, both for planning and for practice. For example, one counter-intuitive issue is that the gradient of the peak your system can handle is a factor of the startup time of each additional process and your maximum response time.
Process startup times on cloud platforms can be really long. One quite popular service adds 2 minutes to the programs own startup time. That makes it only of very limited use for handling demand peaks. It’s only really good for saving money on day/night cycles.
I don’t want to say that auto-scaling is harmful per se either, but I have seen, on balance, more issues caused by it that solved. A fair chunk of that is because it’s over-used - or used very casually with an extremely large range.
What I got from the PostgreSQL connection example is that it’s impossible to do the right thing without being able to express additional constraints on the pod. K8s needs to know about the connection limit just as much as it needs to know how much memory is installed in the cluster. Otherwise you have to either permanently overprovision or deal with the risk of this failure case.
I haven’t done a k8s config in a couple of years; is it possible to add custom constraints like this nowadays? (Of course the question is about scaling and pod placement generally, not auto scaling per se.)