I do agree that it’s definitively a problem that if one thing goes wrong in your state, but you urgently need to provision something else within the same terraform module/state this can become a true problem (thankfully didn’t run into such yet though - but I should become more prepared for that).
To be honest, these sorts of problems are also confronting you in other environments - for example I couldn’t switch-env on my NixOS installation because emacs-wayland wouldn’t compile anymore, making it impossible to apply the rest.
What this article doesn’t mention is terragrunt, for example with terragrunt my org splits VPC and EKS into multiple modules/states and they are still linked properly to each other by defining output/input + ferraform. This will be needed not only because you’re going to be confronted with the problem scenario described by qovery, but also because terraform will become very slow the moment you’re reaching a critical amount of resources. For provisioning production through CI this is fine by me, but development can become a real pain if you’re going to need 90 seconds to apply your module.
From my experience terraform is great for the infrastructure. Most of the mentioned issues have solutions like splitting the state and managing the multiple steps externally / via plugins. On the other hand, I’m not sure if that’s what they called “services”, however for app deployments it sucks. I’d struggle to define the threshold specifically, but once what you deploy frequently something very flexible in behaviour, failures, etc. it just doesn’t match that model.
It’s not even specific to terraform. CloudFormation itself just doesn’t work for this. “Your app configuration is not correct” can be often recovered from very quickly by wiping it and redeploying previous/new version. But infrastructure management frameworks usually go with “well, let’s give it some time to maybe stabilise and then destroy relevant resources” which can take ages. Then come issues like “we need to wait for the failed app deployment to time out, but for the fix we need to change infrastructure, but that’s blocked on app deployment”.