1. 46
  1.  

  2. 12

    I can see a big glaring methodological issue with the test. You said on the orange site that you have a 20 connection limit in pgbouncer, which is fine, but you’re running 5 async workers with 10 connections each. That means at any given point there’s going to be 50~ connections connecting to pgbouncer, with 30 of those just waiting in the queue. Contrast that with the 16 connection maximum of the sync workers, which means there’s always 4 spare connections. I think this might account for some of the higher latency of the async workers.

    I’d also say to use asyncpg over aiopg. Also did you run all of the async workers using uvloop or just the ones that use it themselves by default?

    1. 5

      Well, I think that the sync versions opened 16 workers * 4 (python-side) connections - per the SimpleConnection object’s constructor args - so 64 python-side connections. The async versions opened 10 * 5 (python-side) connections, so 50 python-side connections. As I also said elsewhere these were multiplexed through pgbouncer with a 20 connection cap so neither really had so many connections as they thought.

      I suspect that you are right and waiting for a database connection probably happened more for the async implementation but I think that’s a really inherent problem with async of this style. Even if you can cheaply create 1000 green threads, you still have to negotiate access to a small number of database connections. Even with a database with a lighter weight concept of a connection (I’m thinking about MySQL here) you still cannot reasonably create 1000 connections. At any rate it certainly will not be faster (and the async implementations did not work well for me before I put pooling in).

      Kind of odd to split a discussion across two sites like this :) But lord the orange site is rough and I like it better here.

    2. 7

      I’m gonna pass over the problems of the methodology that have been pointed out before and stop right right at the title: “Async Python is not faster”. The click-baitiness aside, it demonstrates a prevalent misconception that – as a card-carrying async aficionado – I’ve always found problematic.

      Asynchronous IO is about one thing only: a better usage of resources. Never has anyone claimed that your code will get magically faster by switching from preemptive multitasking to cooperative multitasking. The promise was always that if you get 1,000 instead of 10 simultaneous connections, your code will slow down linearly and not fall over. And in the case of higher-level languages like Python, you also get better ergonomics through explicit concurrency and nicer APIs (lol epoll, BTDT).

      So if you want to serve many clients – some of which having huge latencies – at once? Use async. You want to run I/O-bound code (doesn’t have to be network – black achieves some great feats with asyncio) concurrently with nice APIs? Use async. Long lived connections like websockets in a GIL-ed runtime? Use async.

      But if all you do is getting data from a database and serve it by HTTP? You’re gaining nothing and you’re paying by having to sprinkle your code with async and await for no measurable gain. You also lose all benefits of database drivers and SQLAlchemy using C to release the GIL. So yeah, even if the benchmarks weren’t flawed: they don’t matter if you choose the right tools for the job.

      The fact that async is a poor match for this job, does not mean it’s bad for the jobs that it was built for.

      1. 0

        Never has anyone claimed that your code will get magically faster by switching from preemptive multitasking to cooperative multitasking

        If only - performance claims from async web frameworks are in fact extremely common. I covered Vibora in the article. Starlette and Sanic both also make prominent claims in their documentation vs alternatives (which, let’s be honest, are Django and Flask). These claims were not proved out in my tests.

        The promise was always that if you get 1,000 instead of 10 simultaneous connections, your code will slow down linearly

        This promise seems very dubious because in fact I found, both in the benchmark and out of it, that the async frameworks dealt extremely poorly with high load.

        Moreover I think the topic of dealing with connections rather than requests is in my opinion moot because very few people terminate a TCP connection that has arrived over the internet with their Python program. Amazon’s ELB, HAProxy, nginx etc are used for that. And then of course there is the question of whether defining an autoscaling group is a more appropriate solution for this worry than writing your application in a special way.

        So if you want to serve many clients – some of which having huge latencies – at once? Use async. You want to run I/O-bound code (doesn’t have to be network – black achieves some great feats with asyncio) concurrently with nice APIs? Use async. Long lived connections like websockets in a GIL-ed runtime? Use async.

        I think some of this is fine as far as it goes. Using asyncio for an websocket service makes intuitive sense to me and especially if you avoid doing any CPU work I think that will probably work fine. However this is not as far as it goes - there is a profusion of general purpose web frameworks and other tools that are clearly intended to do much more than just TCP connection management. That is the problem.

        1. 4

          performance claims from async web frameworks are in fact extremely common

          The boisterous claims of async frameworks have irked me for a long time, however the title of your post is not “async frameworks make misleading claims about performance” or “simple web apps don’t need async” but “Python async is not faster”. The irony of quoting NJS, an author of an async framework, as an argument against async in general has been already pointed out too.

          It seems like to me you were disappointed by the characteristics of a web app running asynchronously, drew the wrong conclusions, and quoted any material that remotely seemed to confirm your case – not even shying away from pulling gevent into the mix.

          I can assure you that watching people froth over async as the silver bullet for anything is just as frustrating from my end however I’m afraid you took the wrong turn. I’d suggest to have a look what the original promises were and what good fits for that are – the comments here should have given you a few good pointers. If you judge a technology by what excited kids push to GitHub or Medium and compare it to reality, you’re always gonna be disappointed.

          This promise seems very dubious because in fact I found, both in the benchmark and out of it, that the async frameworks dealt extremely poorly with high load.

          Yes, but the problems with your benchmark have been discussed elsewhere so no need to reiterate them here.

          Moreover I think the topic of dealing with connections rather than requests is in my opinion moot because very few people terminate a TCP connection that has arrived over the internet with their Python program. Amazon’s ELB, HAProxy, nginx etc are used for that.

          That is only true for short-lived, stateless HTTP requests which async is only a mediocre fit for. Only few people will argue this point. However there’s many more types of connections, the most common one probably being web sockets and good luck handling them with a sync framework with more then ten clients. But I can assure you there are many more and I wouldn’t want to miss async networking when dealing with them.

          1. 0

            Originally you claimed that

            Never has anyone claimed that your code will get magically faster by switching from preemptive multitasking to cooperative multitasking

            And now you admit that

            The boisterous claims of async frameworks have irked me for a long time

            I think you are right the second time and this is my feeling too.

            Re:NJS - for what it’s worth I didn’t quote him but I wouldn’t feel bad if I did. I don’t think it’s wrong to surmise from the progression of asyncio -> curio -> trio that async is difficult. I am not “out to get” async but I do strongly dislike the chronic over application of it - which, it sounds to me, you also recognise as a problem.

            1. 3

              I think the problem here is that I was talking about the people that built async APIs (epoll/kqueue/…) and low-level frameworks (asyncio, Twisted, trio, …) and you about applications/framework that build on it (not gonna name them to avoid unnecessary shaming).

              I absolutely see the problem of its misapplication, which is why I didn’t argue about the benchmarks at all: I don’t find them interesting for that use case because the use case isn’t interesting.

              But I also don’t see how your post is conveying that point neither from reading it myself nor from the reception it got.

      2. 10

        It’s not supposed to be faster, it’s supposed to be async. You don’t use it to go faster, you use it because you have a large number of clients and most requests spend most of their time waiting for outside events rather than computing, or because you’re using some piece of code during the request/response cycle that needs to be tied into an event loop, or just because it’s the reasonable way to do streaming responses.

        1. 6

          What about memory benefits? We run 3 gunicorn / Django workers and let me tell you, in the right circumstance just those can bugger up a 2Gb RAM instance.

          1. 3

            Well, it’s true that async would use less memory as there would be fewer instances. That said, most real world apps are more CPU bound than my example and you wouldn’t need 4 * cpu count workers to saturate CPU. Many real world apps do need just 2 * cpu count worker, which on the machine I used would give worker ~2GB of RAM each.

            That said, I would not be wanting to use async for anything under load because of the latency problems I mentioned in the article. This seems to be a problem people run into a lot and (in Python) it’s not easy to port your asyncio code to use normal Python when you run into it.

            1. 1

              I think we must be doing something wrong on our side then, thanks for the reply.

          2. 4

            I think Łukasz Langa, Python core developer, has some serious comments about the benchmark setup: https://twitter.com/llanga/status/1271719778324025349?s=19

            1. 3

              Thanks for linking this. A bit of rebuttal from me:

              1. As I stated in the article, I did try 4 async workers. Performance was worse than with 5 workers (though not hugely). I don’t have a potted explanation for this I’m afraid - I can only say for sure that using 4 async workers instead of 5 did not change the results for the better in asyncland. (Note: I tried varying the worker numbers for all frameworks individually, not as a collective).

              2. I take the point about running the whole thing on one machine. It would be better if I hadn’t of course. It seems unlikely that doing so would change the result since load on the other components was so low. I would be keen to read of any benchmark results using such a multi-machine setup, particularly any that find favourably for async, as I don’t know of any. I would add for anyone hoping to replicate my results (as friend of mine did): it takes a lot of time. It’s not enough in my opinion to just throw up these servers in naive manner, you need to make a good faith effort to tune and add infrastructure to improve performance. For example, when I ran the async servers without a connection pool they broke everything (including themselves).

              3. Beyond my own results, there is a chunky body of extant “sysadmin lore” that says: async is problematic under load. I reference a few of the publicly available reports in my article: from Etsy; claims from inside a ridesharing startup; etc. I have also had negative private experiences too (prior to asyncio). The SQLAlchemy author wrote several years ago about this problem and kindly appeared in the HN thread to repeat his claims. The Flask author alluded to unfavourable private benchmarks, presumably from his workplace. The list goes on (including in other language communities).

              1. 4

                Hi.

                The point about not scaling above ~4 workers on 4 vCPUs has little to do about 4 vs 5 workers. It’s about being able to saturate your CPU cores with much fewer processes compared to sync workers.

                You could at least acknowledge in your post that sync frameworks achieve on par performance by using more memory. Hard to do an exact apples to apples comparison but the idea stands: async frameworks allow much denser resource usage.

                The reason why running your database with your Python process is not a realistic case goes beyond the operational problems with it (no high availability, no seamless scaling, hard upgrades and backups). The problem is that it unrealistically minimizes latency between the services. It doesn’t take much for the sync case advantage to go away as soon as you put the database on a separate box.

                That separation would also allow for cheaper scaling: you can run just a few micro instances with little memory and a single vCPU and async workers will be perfectly happy with that.

                Finally, appealing to authority and “sysadmin lore” should be out of scope for a benchmark that tries to be objective. For every Etsy I can give you Facebook that moved entirely to an async request model, including Instagram which is using Python 3. And Nginx which you’re using yourself in your benchmark was a big upgrade over Apache largely because of its single-threaded async model vs. a pre-fork server.

                You also need to be careful whose authority you’re appealing to. Quoting Nathaniel J. Smith point out deficiencies of asyncio loses its intended strength when you add that he is such a strong proponent of asynchronous programming that he created his own framework. That framework, Trio, is a fantastic research environment and already has informed evolution of asyncio and I’m sure will keep doing so. That’s the point: Nathaniel’s posts aren’t saying “stop using async programming”. They are saying “here’s how we can make it better”.

                1. 2

                  The memory point is fine - for sure less memory is used. How important that is depends on deployment, as traditionally memory usage is not a huge problem for webservers. I contend: not very important for most people.

                  I don’t accept that the implication that I need to build a HA postgres cluster with backups and replication chains and whatnot in order to test. That would just raise the goalposts so high that it would just be a huge amount of effort and cost for anyone to construct a benchmark. If you’re aware of a cache of publicly available benchmarks that met your exacting criteria in this respect, referencing them would be great.

                  Going to the harder nut of that - the lower latency via running on the same machine - I am doubtful about how much it matters. Adding more blocking IO operations is simply not going to help because (as I stated elsewhere on this page) IO model just does not seem relevant to throughput for “embarassingly parallel” tasks like webservers. The fact that UWSGI is native code is the biggest determinant of throughput. For response times of course, doing something else while waiting actually seems to hurt - async workloads don’t get scheduled as fairly as the kernel scheduler does for processes.

                  Nginx using async is fine - everyone seems to think that nginx works ok and the Python community did not have to rewrite a large portion of their ecosystem in order to switch from apache2 to nginx.

                  On the subject of syadmin lore - I’m afraid that I don’t agree that it is out of scope! I’m not bound by intergalactic law only to consider my own evidence and I think it’s probably a good idea to consider outside evidence as well as what I have available to myself - after all it’s not as though I will have many opportunities to replicate multi-year programmes of software engineering in a cleanroom environment.

                  Thanks for taking the time to find me on a different medium in order to respond.

              2. 1

                I mean you really shouldn’t go past the title here.

                The claim that sync code somehow would be faster is absurd in its own righ unless your program has absolute 0 IO wait the async overhead will always be lower than benefits.
                The only really argument here would be this increased code complexity increases likelihood of faults.

                1. 2

                  The claim that sync code somehow would be faster is absurd in its own righ unless your program has absolute 0 IO wait the async overhead will always be lower than benefits.

                  Maybe true in python, I don’t know. Demonstrably untrue for high-throughput work on servers with high core counts due to the locking overhead.

                  1. 1

                    And yet it is faster and I try hard to explain why in the body of the article (which of course I recommend strongly as the author of it :)). To briefly recap:

                    1. IO model is irrelevant as OS scheduled multi-processing solves the problem of embarrassingly parallel workloads blocking on IO
                    2. Use of native code matters a great deal and is otherwise the dominant factor
                    1. 1

                      And yet it is faster

                      To me it seems like really digging for minute edge cases.
                      Async code, especially in python, is about implicitly eliminating wait. Meaning I can deploy my app anywhere, on any machine, in any way and it will always choose to optimally manage IO wait time.

                2. 9

                  If you’re concerned about speed, why are you writing in Python

                  1. 17

                    For me the reason is that you can get good speed in Python while keeping all the benefits of having a language like Python. This works in many different fields. The scientific(/“data”) community use Python as a thin flexible shim over a mess of highly optimised native code.

                    I suspect that for most people who are writing web->database code like this, 5k requests/second off a machine that is 15 EUR/month is plenty.

                    1. 8

                      Path dependency - i.e. “this is how we got here”?

                      If - for whatever reason - your initial implementation is in python, you can grow and scale quite a way.

                      At a decent scale, a 10-20% perf saving (tweaking your current infra) can be quite a rational thing to pursue, even if longer term plans involve a significant rewrite into another tech stack.

                      1. 8

                        Because you are also concerned about a few other things, as well as hoping for some speed.

                        1. 5

                          I really ought to write this up as a proper essay, but I see this kind of comment a lot, and I don’t get it at all. Of course performance matters, for any language!

                          Think of each quality a language can have (throughput, latency, predictability, library availability, speed of development, ease of deployment, safety, backwards compatibility, build time, etc…) as an axis. For each point in that space, different languages will be more or less fit for purpose. Each language ideally has a region along those various axes where it’s very good, perhaps even the best. As you move along the different axes, that language becomes less fit for purpose, and you start to make different choices about which to use.

                          If you go to the extreme of some axis, you’ll radically constrain the languages you consider. A web server that needs extremely low-latency with high throughput rules out a ton of languages, including Python. A web server of any kind is not going to be written in Bash.

                          However, even though Python is never going to be the language you choose for the most performance intensive jobs, improving Python’s performance will expand the region where Python is a good choice. The same goes for reducing the verbosity of Java, trying to write a shell without the foot-guns of bash, or every other seemingly quixotic improvement to a language.

                          Since Python doesn’t compete primarily on performance, it’s probably less important than improving the things that make Python great, but it is still valuable.

                          1. 4

                            I think it’s healthy to expect better performance from Python. I don’t see why it has to be relegated to the category of sluggish languages.

                          2. 2

                            Well, “just 18%” is a big difference and I don’t think that you’re testing the right things.

                            For instance, don’t use the database if you’re benchmarking the web response times. Async database libraries are still figuring their shit out in Python. Other people have pointed out flaws in this experiment as well.

                            1. 1

                              What went wrong with Python’s speed? PHP managed to improve its speed a lot just by refactoring its codebase, and it’s not even a JIT yet. Python remained so slow, that it’s just accepted as a fact now.

                                1. 1

                                  My understanding is that the VM avoids doing too many optimizations to keep the implementation simple. It’s a philosophical choice on the part of the designers.

                                2. 1

                                  So just this week I’ve shifted over a pretty heavy service (~30K rps) over to R2DBC (Spring + Webflux). I saw this article popping up at various places; and kind of refrained from commenting due to technology differences. While I can already see gaps in connection management and worker pools (enough comments here point it out), I can back the idea of async IO being better on throughput and p99 latencies (avg might be off a little bit but in production that doesn’t matter much). Would be more than happy to share numbers but just to give rough idea our p99 dropped from 600ish ms to 80ish ms. Only reason being the locks and choking threads to not fully use CPU and keep the train running. I was surprised by how hard you hit the wall after particular amount of RPS (depends on scenarios). I do plan to write a response blog post if I get time, but for folks taking this seriously please don’t!