Excellent post. I especially like that it’s not looking for a winner but providing a basis of comparison and discussing the semantics around those options.
Thorough benchmarking code, too. Not just a set of one-off scripts.
Very interesting. I was surprised by how poorly some of the purpose-built messages queues performed and how well redis did considering it offers queues pretty much as a handy afterthought.
All tests were run on a MacBook Pro 2.6 GHz i7, 16GB RAM.
These tests are evaluating a publish-subscribe topology with a single producer and single consumer.
Mean Latency (ms)
This got downvoted twice for being trollish.
Do you think I don’t believe those are huge methodological problems which wholly invalidate this benchmark or do folks seriously not understand why these are huge methodological problems? Or are you huffy because I’m not being nice?
You separate your load generation tools from your system under test to eliminate confounding factors of shared resources like CPU, memory busy, thread scheduler, memory allocator, disk, etc. This benchmark doesn’t do that, which means it’s impossible to distinguish between a queue system which is being saturated with work and a queue system which is out-competing the load generators for CPU. Also, it’s a laptop running OS X. Do you plan on fielding a queue system in a DC built out with Macbooks? No? Then this benchmark might as well be on a phone for all the inferential value it provides.
A single producer and single consumer means zero contention for either for most implementations. How well does this scale to your actual workload? What’s the overhead of a producer or a consumer? What’s the saturation point or the sustainable throughput according to the Universal Scalability Law? It’s impossible to tell, since this is a data point of one, which means it can’t distinguish between a system which has a mutex around accepting producer connections and a purely lock-free system. And that’s a shame because wow would those two systems have different behaviors in production environments w/ real workloads.
Measuring the mean of the latency distribution is wrong. Latency is never normally distributed, and if they recorded the standard deviation they’d notice it’s several times larger than the mean. What matters with latency are quantiles, ideally corrected for coordinated omission.
Do I need to trot out these exact same fucking critiques every time someone posts a poorly-designed benchmark here, or can we as a community develop a sort of short-hand notation for dismissing misleading analysis?