1. 37

I figured instead of trying to sift through blogspam, I figured I’d ask the community I’m already familiar with. You’re the best, lazyweb!

So, I at $work for $client, I have an application (Node.js application server, nothing fancy) that I’d want to provide a realistic-looking load on the test server, in order to simulate some performance issues and crashing that seemingly only occurs under load in production.

My tendency would be to just run a shell script of something like forever { curl $API; sleep $RANDOM; } on a bunch of computers, but that seems crude. Are these better tools here?

  1.  

  2. 17

    I’ve used ab, siege, jmeter and wrk in the past. I think wrk was the best (for quick stuff and heavy load) unless you actually want to set elaborate test setups of complete workflows incl login, doing stuff, logout, etc.pp - then jmeter.

    NB: I’ve not done web development full time for many years, so maybe there’s new stuff around.

    NB2: Always measure from a second host, a good stress testing tool will destroy the perf of the host it is running on.

    1. 2

      Do you know if wrk have fixed their coordinated omission problem or should one still use wrk2 to get correct measurements?

      1. 1

        I’ve always used the (unmaintained-looking) wrk2 instead of wrk, precisely because of C.O. Nothing I could see in the wrk readme suggested they merged the C.O. work back in..

        1. 1

          No, as I said it’s been a while and I don’t remember this term or type of problem, sorry. It worked for me at the time.

      2. 14

        realistic-looking load

        I like https://locust.io because it enables me to simulate not only simple requests to a bunch of URLs but scripted workflows like different users, logging in or not, browsing pages in different order, and so on. Depending on your app, its performance characteristics and caching implementations that might make a difference.

        1. 1

          Yeah, I’ve been happy with Locust in two different incarnations of a load-testing tool at my current job.

          1. 1

            Same.

          2. 13

            We use vegeta at work and it’s pretty great.

            1. 3

              Importantly, it does not suffer from coordinated omission, which almost every other load generator does. Especially the ones you hack together on your own. Properly generating load is hard.

              Coordinated omission is when the load generator backs off on the pressure when the target is overloaded. It happens naturally since you can’t keep an infinite amount of outgoing requests active at once. You have to compensate for this effect.

              Coordinated omission sounds like something that will just result in overestimated performance numbers, but it’s worse than that. It can easily reverse figures, making performance improvements look like things got worse.

              1. 1

                It seems like the Coordinated Omission problem is more or less a euphemism for backpressure. Backpressure is a cross-cutting concern, though. You can see it if any of your systems in the chain slow down, from the load generator, through the web server, appserver, etc. A bottleneck in any area makes the downline systems look “bettter” under reduced load.

                The remedy for backpressure awareness is that, as you add more concurrency you watch that the graph for throughput and requests/second increase in the exact same manner as the increase in VUs. I don’t think this is a tool problem–just an area to have awareness. For example, as I ramp up load, if I’m using a step pattern, bandwidth and hits/second should follow the step pattern. If they don’t, the test is invalid the moment they deviate.

                The worst tools, in my experience, are the ones that “throw load” when real users would be “stuck.” Those tools are far more dangerous and have worse failure modes. For example, if the page returned happens to have a 200 status code, with a giant message saying “Critical System Failure” a tool that ignores backpressure and slings load might show you improved performance once the error page appears as it loads fast!

                1. 2

                  Good points! With your awareness of the problems involved, and willingness to declare a test invalid when the actual load doesn’t match the attempted load, I’m not too worried about you drawing the wrong conclusions – no matter what tool you use.

                  The problem you mention in your last paragraph is a problem whether or not your tool corrects for constant attempted load. Any tool ignores backpressure when the “backpressure mechanism” is a very fast 200 OK.

            2. 10

              I like hey for its simplicity.

              1. 5

                I love hey, but recently found it being unmaintained and switched to oha.

              2. 10

                I’ve only used it for a sum total of 30 minutes, but https://k6.io/ was pretty easy to get going.

                1. 4

                  Never even heard of k6 before, it looks wicked. Nice one.

                  1. 3

                    I’m also a fan of k6, since it’s not more complex than most one-line CLI tools to get going, but you have full-on JS scripting if you need to script complex behavior.

                    1. 3

                      I’ve also had success using k6. Solid tool.

                    2. 8

                      In the past I’d have taken a very similar approach and just bashed something together, admittedly with ab rather than curl, but it flat-out lied to me so many times before I finally realised it was just totally unreliable that when I did realise, I vowed never to use it again. That was probably 15 years ago, so I might well be missing out on a new, better (working?) version, but I totally don’t care.

                      I definitely do still write custom test tools for API integration tests, especially where there are e.g. login or register flows or anything where there’s client state to manage. But for performance and load, where it’s often about a large and somewhat-randomised set of accesses to broadly the same set of URLs, a targeted load tool is really helpful, if not essential, because it allows you to treat your test processes as a flock, configure in broad sweeps, and hammer repeatably and reliably, without worrying so much about having to check/test your test code.

                      For straight up web and web API, I’ve used Siege to great effect, it balances flexibility with straightforwardness very well. tsung is also very capable but can be quite complicated to configure how you want, as it does a lot, not only web load. Also httperf has been useful, though it seems a bit old-school now there are cool kids like Artillery on the block, which is neat and shiny, but I’ve sometimes found has a few rough edges.

                      As for websockets, for testing connection load, i.e. just “how many WS conns can I establish and keep alive”, tsung and Artillery are both very capable, but for anything that actually requires maintaining client state in websocket comms (which seems to me like, well, all websocket applications, as it’s usually supporting a rich front-end), I pretty much always end up rolling my own, using some combination of erlang/OTP, gun (http(s)/ws(s) client) and syn (process group management/pub-sub). The multicore concurrency is so good, so straightforward and so automatic to scale to however many cores you throw at it (well, there are limits but I’ve never hit them), that it’s really powerful for generating and maintaining a lot of stateful and complex load patterns. I usually end up needing a much bigger machine for the clients than for the server, because each one has its own state and probably much more involved work to do than the back-end (again, talking about in rich interactive web apps), but I basically have zero orchestration problems because it’s just the erlang VM running on a mahoosive instance with tens or hundreds of cores and managing the distribution for you, unlike if it was in node and you wanted to go past 1 core so you had to start using process/cluster management tools or whatever.

                      Oops. Essay. I just love load testing websocket app servers ¯\_(ツ)_/¯

                      Edit: oooh, some excellent other suggestions, I totally forgot about JMeter and wrk, they’re good too. I never really got on with the Swing-y JMeter UI - but I feel like choices between one of those and e.g. siege are mostly personal preference.

                      1. 4

                        https://locust.io/

                        I’ve had a fair bit of luck using their framework, and tested application servers that use websockets, HTTP streams, and apps that require loging/creds using a somewhat realistic load using generated data. Locust can run as a cluster, so that’s plus as well. The metrics are pretty good, but my only complaint is that the output is not in json readable form, so it’s sometimes a little tricky to run the entire test suite as a script with separate control and experimental tests. Ultimately, locust worked well and we figured out what our rate limits were for a variety of different endpoints w/o too much effort.

                        1. 2

                          I’ve used gatling in the past, and feel so-so about it. It’s rather powerful but that power relies on writing scala using a relatively poorly documented API. I’ve used jmeter, and would not recommend it for most things, but it might work for your scenario.

                          One thing to keep an eye out for is coordinated omission. It sounds like your curl; sleep method would be vulnerable to it. https://github.com/artilleryio/artillery/issues/721

                          1. 2

                            At Wikimedia we use ab plus gnuplot mostly because they’re simple and well tested tools that are packaged in Debian.

                            Our Python wrapper around those is open source as well.

                            1. 2

                              As a company with a lot of Java, we use Gatling as it’s Scala, so as a JVM language there’s a rough level of familiarity.

                              It does what we need it to, but it’s not a language we know very well so makes a fair few things more awkward, eespecially where we have more complex projects

                              1. 2

                                One thing worth mentioning is you often need different tools for different scenarios. I work at Elastic and often help customers with Elasticsearch performance testing. Our preferred testing tool for Elasticsearch is Rally. Reason being that we maintain it and understand whether the bottlenecks could be.

                                I’ve had a couple situations where the customer was using JMeter or Locust and it was really hard to discover if they were connecting with best practices and we ultimately discovered issues (networking and slow code in critical path). As such I’d recommend you use the recommended tool for testing databases (i.e. for PostgreSQL pgbench).

                                1. 2

                                  This is not what the question asked, but since details of the question reveal some gaps in knowledge about load testing, I’ll take a moment to address one issue:

                                  Knowledge of how to properly measure and test load is way more important than choice of tooling. Gil Tene talks about this in an approachable way. A good start might be the YouTube video that got me down the rabbit hole a long time ago: https://m.youtube.com/watch?v=lJ8ydIuPFeU

                                  1. 1

                                    Really good content in that, in general!

                                    I have some thoughts on the Coordinated Omission problem, but I moved them to your comment above.

                                  2. 1

                                    Locust & Tsung

                                    1. 1
                                      1. 1

                                        I always found ab to be super cumbersome to use, so generally I’d reach for siege.

                                        1. 1

                                          At my previous company, I set up Artillery to do this for our Node API. We had some complicated auth schemes and Artillery let us easily set up a Yaml config with a realistic user path for each scenario.