1. 4

    I think Łukasz Langa, Python core developer, has some serious comments about the benchmark setup: https://twitter.com/llanga/status/1271719778324025349?s=19

    1. 3

      Thanks for linking this. A bit of rebuttal from me:

      1. As I stated in the article, I did try 4 async workers. Performance was worse than with 5 workers (though not hugely). I don’t have a potted explanation for this I’m afraid - I can only say for sure that using 4 async workers instead of 5 did not change the results for the better in asyncland. (Note: I tried varying the worker numbers for all frameworks individually, not as a collective).

      2. I take the point about running the whole thing on one machine. It would be better if I hadn’t of course. It seems unlikely that doing so would change the result since load on the other components was so low. I would be keen to read of any benchmark results using such a multi-machine setup, particularly any that find favourably for async, as I don’t know of any. I would add for anyone hoping to replicate my results (as friend of mine did): it takes a lot of time. It’s not enough in my opinion to just throw up these servers in naive manner, you need to make a good faith effort to tune and add infrastructure to improve performance. For example, when I ran the async servers without a connection pool they broke everything (including themselves).

      3. Beyond my own results, there is a chunky body of extant “sysadmin lore” that says: async is problematic under load. I reference a few of the publicly available reports in my article: from Etsy; claims from inside a ridesharing startup; etc. I have also had negative private experiences too (prior to asyncio). The SQLAlchemy author wrote several years ago about this problem and kindly appeared in the HN thread to repeat his claims. The Flask author alluded to unfavourable private benchmarks, presumably from his workplace. The list goes on (including in other language communities).

      1. 4

        Hi.

        The point about not scaling above ~4 workers on 4 vCPUs has little to do about 4 vs 5 workers. It’s about being able to saturate your CPU cores with much fewer processes compared to sync workers.

        You could at least acknowledge in your post that sync frameworks achieve on par performance by using more memory. Hard to do an exact apples to apples comparison but the idea stands: async frameworks allow much denser resource usage.

        The reason why running your database with your Python process is not a realistic case goes beyond the operational problems with it (no high availability, no seamless scaling, hard upgrades and backups). The problem is that it unrealistically minimizes latency between the services. It doesn’t take much for the sync case advantage to go away as soon as you put the database on a separate box.

        That separation would also allow for cheaper scaling: you can run just a few micro instances with little memory and a single vCPU and async workers will be perfectly happy with that.

        Finally, appealing to authority and “sysadmin lore” should be out of scope for a benchmark that tries to be objective. For every Etsy I can give you Facebook that moved entirely to an async request model, including Instagram which is using Python 3. And Nginx which you’re using yourself in your benchmark was a big upgrade over Apache largely because of its single-threaded async model vs. a pre-fork server.

        You also need to be careful whose authority you’re appealing to. Quoting Nathaniel J. Smith point out deficiencies of asyncio loses its intended strength when you add that he is such a strong proponent of asynchronous programming that he created his own framework. That framework, Trio, is a fantastic research environment and already has informed evolution of asyncio and I’m sure will keep doing so. That’s the point: Nathaniel’s posts aren’t saying “stop using async programming”. They are saying “here’s how we can make it better”.

        1. 2

          The memory point is fine - for sure less memory is used. How important that is depends on deployment, as traditionally memory usage is not a huge problem for webservers. I contend: not very important for most people.

          I don’t accept that the implication that I need to build a HA postgres cluster with backups and replication chains and whatnot in order to test. That would just raise the goalposts so high that it would just be a huge amount of effort and cost for anyone to construct a benchmark. If you’re aware of a cache of publicly available benchmarks that met your exacting criteria in this respect, referencing them would be great.

          Going to the harder nut of that - the lower latency via running on the same machine - I am doubtful about how much it matters. Adding more blocking IO operations is simply not going to help because (as I stated elsewhere on this page) IO model just does not seem relevant to throughput for “embarassingly parallel” tasks like webservers. The fact that UWSGI is native code is the biggest determinant of throughput. For response times of course, doing something else while waiting actually seems to hurt - async workloads don’t get scheduled as fairly as the kernel scheduler does for processes.

          Nginx using async is fine - everyone seems to think that nginx works ok and the Python community did not have to rewrite a large portion of their ecosystem in order to switch from apache2 to nginx.

          On the subject of syadmin lore - I’m afraid that I don’t agree that it is out of scope! I’m not bound by intergalactic law only to consider my own evidence and I think it’s probably a good idea to consider outside evidence as well as what I have available to myself - after all it’s not as though I will have many opportunities to replicate multi-year programmes of software engineering in a cleanroom environment.

          Thanks for taking the time to find me on a different medium in order to respond.

      2. 1

        I mean you really shouldn’t go past the title here.

        The claim that sync code somehow would be faster is absurd in its own righ unless your program has absolute 0 IO wait the async overhead will always be lower than benefits.
        The only really argument here would be this increased code complexity increases likelihood of faults.

        1. 2

          The claim that sync code somehow would be faster is absurd in its own righ unless your program has absolute 0 IO wait the async overhead will always be lower than benefits.

          Maybe true in python, I don’t know. Demonstrably untrue for high-throughput work on servers with high core counts due to the locking overhead.

          1. 1

            And yet it is faster and I try hard to explain why in the body of the article (which of course I recommend strongly as the author of it :)). To briefly recap:

            1. IO model is irrelevant as OS scheduled multi-processing solves the problem of embarrassingly parallel workloads blocking on IO
            2. Use of native code matters a great deal and is otherwise the dominant factor
            1. 1

              And yet it is faster

              To me it seems like really digging for minute edge cases.
              Async code, especially in python, is about implicitly eliminating wait. Meaning I can deploy my app anywhere, on any machine, in any way and it will always choose to optimally manage IO wait time.