This isn’t unorthodox at all! It all rings true to me. Vegeta is my tool of choice, which satisfies all of the enumerated requirements and more besides.
It’s also important that the loader generates those requests at a constant rate, best done asynchronously, so that response processing doesn’t get in the way of sending out new requests.
In my experience strict constant rate load testing can make it hard to notice that you reached the crippling point of the service. I once had very strange results, where the probe returned slow, but still within reason results, but the results from the loader machines were indicating failure. Turned out that service could serve results for all the open connections that it had, but that wasn’t enough to satisfy the load set for the loader, and it ended up exhausting the available IP:port pairs seemingly which caused what initially seemed like response errors. Adding more load testing machines still caused those errors, which confirmed that the service was at it’s worst point already, but also started to add strict limits on how many connections could a single loader open up, since that would just end up showing up as being unable to satisfy the requested load, rather than fairly cryptic connection errors.
They do, but most often the default is uncapped. I’ve learned that there should be a cap there, always. For most services, more than 1k connections from a loader will make no difference in what load you can achieve.
IME a single load generating process should probably max out at O(100) connections at most. It applies load from a single IP, which in normal circumstances would represent a single client. There’s a lot of machinery between a given client’s outbound connection and your application’s inbound socket, which will (should!) reach various limits re: connection cardinality well before your application is saturated.
(edit: I guess this is actually consistent with what you’re saying, that >1k concurrent connections is ineffective.)
And I guess any application that needs 1k+ connections per loader to stress its capacity would have average latency of like 500ms-1s+, maybe? My experience is that load testing tools usually assume a nominal latency of O(1-100ms) — anything more that that and you’re not really learning anything from the experiment.
In short, “the load you can achieve” should be evaluated via the application doing the work, rather than the network stack of the operating system hosting the application. If you hit TPS limits before the application maxes out on CPU or memory or whatever, you gotta change your test setup.
In my case at that point the responses were at ~100ms on average with 512 workers, which was the the very top of what would still be workable in an emergency scenario for the service, and even though though I probably could have added more workers, the difference wouldn’t be big, as I arrived at that number by doubling the number of workers until maximum requests per second stopped increasing more than 10%. From there you can put your load down from “as high as it goes” down to something more reasonable being sure that you’re not bottlenecking on load generation or introducing more workers than needed. But tools often have such ridiculous defaults (e.g. Vegeta’s default connection limit is 10k, default worker limit is 18 quintillion!!!) that will absolutely blow in your face when pushing services near the limit, while being useless while not doing it, that I wonder if the authors ever did such testing themselves.
Stating the obvious — if the average latency for a request is 100ms, then you get 10 RPS per client worker thread i.e. connection. The max RPS for an individual service is defined by whatever resource limit is reached first, so if for example each requests takes negligible CPU/memory/etc. resources, then it’s indeed possible that your limits are defined by the server’s network stack, ephemeral ports availability, etc. But that’s pretty unlikely. 100ms+ requests are usually that slow due to syscalls, i.e. disk or database reads/writes, and occasionally CPU saturation. Assuming that’s true, it’s been my consistent experience that those resources are exhausted miles before the network stack, ephemeral ports, etc. limits are hit. If you have counter-evidence, I’d honestly love to hear about it!
But tools often have such ridiculous defaults (e.g. Vegeta’s default connection limit is 10k, default worker limit is 18 quintillion!!!) that will absolutely blow in your face when pushing services near the limit, while being useless while not doing it, that I wonder if the authors ever did such testing themselves.
As an author of Vegeta I can assure you that the tool has been applied in extreme circumstances, i.e. net 100M+ RPS 😉 The default connection limit is meant as a sort of failsafe for default use cases, and the worker limit is I think the int64 max or something, meant to express “spawn as many goroutines as necessary to [try to] meet the expressed target rate” which is again a sort of default failsafe.
When you load test you don’t start by just cranking the knobs to max and seeing what happens. (Not that this is what you were suggesting, but it is something I’ve frequently seen in the wild.) The tools expect you to start with a low-intensity test that reliably succeeds, like 10 RPS and 10 connections and 10 workers or whatever, and then iteratively increase the load with successive tests of nontrivial duration until you discover and can reliably reproduce SLO violations, failures, etc. Maybe this could be made more explicit in the documentation.
Yeah, 100ms was because the system was doing a bunch of actual work. If I recall correctly it was because of CPU saturation, but disk IOPS were up there as well. But the service could still accept new connections, and since it wasn’t satisfying the requested load, Vegeta started up more and more workers, up to the connection limit. The problem with ephemeral port exhaustion was because the tests were done in multiple stages, and the stages were somewhat short, so ports got stuck in TIME_WAIT and the errors began showing up.
As for the knobs, it’s a lot better to have defaults that might not be enough but are sure enough to not blow up in your face rather than having to remember to always turn them down for them to not cause issues. Having a service that is internally capped at 1RPS and asking Vegeta to try and get 100RPS out of it will end up with errors that aren’t caused by the service. Try load testing this small server simulating CPU saturation: the results from Vegeta with default settings will show a bunch of errors when the service is returning none.
from aiohttp import web
import asyncio
a = asyncio.Lock()
routes = web.RouteTableDef()
@routes.get("/")
async def hello(request):
async with a:
await asyncio.sleep(0.1)
return web.Response(text="Hello, world")
app = web.Application()
app.add_routes(routes)
web.run_app(app)
the results from Vegeta with default settings will show a bunch of errors when the service is returning none.
Of course! The requesting client determines whether or not a request is successful, not the server. A saturated server may never see a request made by a client.
For context, Vegeta will always try to apply the load specified via the rate option, subject to the constraints set by the connections, workers, etc. options, against its target. It also has per-request timeouts, same as any other reasonable HTTP client.
If the server you’re attacking is, for example, single threaded, and each request takes 100ms to serve, then you have a maximum theoretical throughput of 10 RPS. So if you attack with a rate of 10/s, then all is well. But if you attack with a rate of 20/s, then you’re going to produce a backlog of requests within Vegeta that will grow continuously for the duration of the test. If you run that test for 10 seconds, then you will produce 20/s * 10s = 200 requests, and your server will only be able to serve 10/s * 10s = 100 requests. Therefore Vegeta will report 200 total - 100 successful = 100 requests as failed.
This is correct! A request which isn’t serviced before the requesting client times out and gives up is a failed request.
This isn’t unorthodox at all! It all rings true to me. Vegeta is my tool of choice, which satisfies all of the enumerated requirements and more besides.
Can second the choice of Vegeta, incredibly easy to configure, but with a lot of flexibility and data output.
In my experience strict constant rate load testing can make it hard to notice that you reached the crippling point of the service. I once had very strange results, where the probe returned slow, but still within reason results, but the results from the loader machines were indicating failure. Turned out that service could serve results for all the open connections that it had, but that wasn’t enough to satisfy the load set for the loader, and it ended up exhausting the available IP:port pairs seemingly which caused what initially seemed like response errors. Adding more load testing machines still caused those errors, which confirmed that the service was at it’s worst point already, but also started to add strict limits on how many connections could a single loader open up, since that would just end up showing up as being unable to satisfy the requested load, rather than fairly cryptic connection errors.
Loaders should allow you to specify the maximum number of connections that will be opened to the target.
They do, but most often the default is uncapped. I’ve learned that there should be a cap there, always. For most services, more than 1k connections from a loader will make no difference in what load you can achieve.
1k! Wowzers!
IME a single load generating process should probably max out at O(100) connections at most. It applies load from a single IP, which in normal circumstances would represent a single client. There’s a lot of machinery between a given client’s outbound connection and your application’s inbound socket, which will (should!) reach various limits re: connection cardinality well before your application is saturated.
(edit: I guess this is actually consistent with what you’re saying, that >1k concurrent connections is ineffective.)
And I guess any application that needs 1k+ connections per loader to stress its capacity would have average latency of like 500ms-1s+, maybe? My experience is that load testing tools usually assume a nominal latency of O(1-100ms) — anything more that that and you’re not really learning anything from the experiment.
In short, “the load you can achieve” should be evaluated via the application doing the work, rather than the network stack of the operating system hosting the application. If you hit TPS limits before the application maxes out on CPU or memory or whatever, you gotta change your test setup.
In my case at that point the responses were at ~100ms on average with 512 workers, which was the the very top of what would still be workable in an emergency scenario for the service, and even though though I probably could have added more workers, the difference wouldn’t be big, as I arrived at that number by doubling the number of workers until maximum requests per second stopped increasing more than 10%. From there you can put your load down from “as high as it goes” down to something more reasonable being sure that you’re not bottlenecking on load generation or introducing more workers than needed. But tools often have such ridiculous defaults (e.g. Vegeta’s default connection limit is 10k, default worker limit is 18 quintillion!!!) that will absolutely blow in your face when pushing services near the limit, while being useless while not doing it, that I wonder if the authors ever did such testing themselves.
Stating the obvious — if the average latency for a request is 100ms, then you get 10 RPS per client worker thread i.e. connection. The max RPS for an individual service is defined by whatever resource limit is reached first, so if for example each requests takes negligible CPU/memory/etc. resources, then it’s indeed possible that your limits are defined by the server’s network stack, ephemeral ports availability, etc. But that’s pretty unlikely. 100ms+ requests are usually that slow due to syscalls, i.e. disk or database reads/writes, and occasionally CPU saturation. Assuming that’s true, it’s been my consistent experience that those resources are exhausted miles before the network stack, ephemeral ports, etc. limits are hit. If you have counter-evidence, I’d honestly love to hear about it!
As an author of Vegeta I can assure you that the tool has been applied in extreme circumstances, i.e. net 100M+ RPS 😉 The default connection limit is meant as a sort of failsafe for default use cases, and the worker limit is I think the int64 max or something, meant to express “spawn as many goroutines as necessary to [try to] meet the expressed target rate” which is again a sort of default failsafe.
When you load test you don’t start by just cranking the knobs to max and seeing what happens. (Not that this is what you were suggesting, but it is something I’ve frequently seen in the wild.) The tools expect you to start with a low-intensity test that reliably succeeds, like 10 RPS and 10 connections and 10 workers or whatever, and then iteratively increase the load with successive tests of nontrivial duration until you discover and can reliably reproduce SLO violations, failures, etc. Maybe this could be made more explicit in the documentation.
Yeah, 100ms was because the system was doing a bunch of actual work. If I recall correctly it was because of CPU saturation, but disk IOPS were up there as well. But the service could still accept new connections, and since it wasn’t satisfying the requested load, Vegeta started up more and more workers, up to the connection limit. The problem with ephemeral port exhaustion was because the tests were done in multiple stages, and the stages were somewhat short, so ports got stuck in
TIME_WAIT
and the errors began showing up.As for the knobs, it’s a lot better to have defaults that might not be enough but are sure enough to not blow up in your face rather than having to remember to always turn them down for them to not cause issues. Having a service that is internally capped at 1RPS and asking Vegeta to try and get 100RPS out of it will end up with errors that aren’t caused by the service. Try load testing this small server simulating CPU saturation: the results from Vegeta with default settings will show a bunch of errors when the service is returning none.
Of course! The requesting client determines whether or not a request is successful, not the server. A saturated server may never see a request made by a client.
For context, Vegeta will always try to apply the load specified via the rate option, subject to the constraints set by the connections, workers, etc. options, against its target. It also has per-request timeouts, same as any other reasonable HTTP client.
If the server you’re attacking is, for example, single threaded, and each request takes 100ms to serve, then you have a maximum theoretical throughput of 10 RPS. So if you attack with a rate of 10/s, then all is well. But if you attack with a rate of 20/s, then you’re going to produce a backlog of requests within Vegeta that will grow continuously for the duration of the test. If you run that test for 10 seconds, then you will produce 20/s * 10s = 200 requests, and your server will only be able to serve 10/s * 10s = 100 requests. Therefore Vegeta will report 200 total - 100 successful = 100 requests as failed.
This is correct! A request which isn’t serviced before the requesting client times out and gives up is a failed request.