A lot of caches are implicitly availability caches. I think what’s being discussed is a cache that continues to serve data past its expiration date. Of course, a latency cache will also hide availability failures for a limited duration. For instance using web cache with even just a one minute expiry will reduce load on the backend, but also lets you restart and reconfigure the backend at will even in production.
I feel like “capacity” isn’t the right word, since it could also refer to storage. A more suitable alternative might be “connection capacity”.
Capacity isn’t limited to storage, if I say my home network has reached it’s capacity limit I don’t mean that I’m out of storage space on my switch. Similarly with CPU and IOPS.
Connection capacity is something more specific, where you model your capacity in terms of the number of open connections (as distinct from requests per second). One example would be a router doing NAT that has to track all the TCP connections going through it. What you’re usually looking at there are memory and file descriptors limits. Depending on the exact type of service there may also be a CPU or network bottleneck that’s relevant.
I’ve some slides that give an introduction to Capacity Planning that may be of interest.
Capacity isn’t limited to storage
Hence, “it could also refer to storage”.
Depending on the exact type of service there may also be a CPU or network bottleneck that’s relevant.
I wasn’t aware there were so many different factors.
Is there no specific term for ‘requests per second’?
Request throughput is probably the closest, doesn’t quite roll off the tongue though.