“Health checks are like bloom filters. A failing health check means a service isn’t up, but a health check passing means the service is probably “healthy””
From $work, failing health checks can also mean that your service is “fine”, but some dependency or assumption of the health checks is not OK (from recent memory, the service can detect when it needs to serve responses in degraded mode, but the health checks see that as irredeemable brokenness).
So, they’re kind of like a two-way bloom filter. Either “yes” or “no” is suspect.
Good point! I was also thinking it could be failure of management interface vs data/Internet interface if health checks go over one and usage over the other.