1. 15
  1.  

  2. 6

    The async interface to the ORM is a biggie. I know a lot of folks have been waiting for that and I’ll bet my bottom dollar that lots of large Django sites end up bottlenecking in the DB.

    1. 13

      Having worked on my fair share of large Django sites: yes, the DB is usually a bottleneck but no, async querying doesn’t fix that.

      For example, I recently built out a little service using Starlite, which is one of the new breed of async-oriented Python frameworks – and started it out doing plain synchronous DB access, then switched to an async DB driver and queries. And I was testing its performance under load (with Locust) at each step.

      And… switching to async everywhere didn’t offer much gain. Which makes a lot of sense: if you’re already at the point where your DB is measurably holding things up, you don’t really gain by accepting even more pending requests that you can’t serve yet. The real performance boost, as always, came with introducing caching (in the form of Redis), which provided a multi-hundred-percentage-point improvement.

      1. 4

        side-note: From a reliability standpoint a capacity cache is not an ideal solution. I came across architectures where a latency cache was added but ended being capacity cache without anyone noticing.

        The obvious solution, in case anyone is trying to figure it out, is to shard the data through multiple DBs (write) and/or scale out reads through read-replicas or data sharding.

        1. 3

          That’s been my experience as well - because in most web applications pretty much every endpoint hits the database so allowing Django to serve more requests at once doesn’t actually speed things up (unless you have endpoints which don’t hit the DB, which I’d expect to be the exception rather than the norm).

          However, one possibility I haven’t looked at yet is that async may make it easier to scale a single worker to more connections. The traditional way to scale a python application is to make multiple processes, 1 for each concurrent connection. However, if it was possible to use a single process (with async) to handle multiple connections at once without a slowdown, there’s a possibility you could use less machines to serve the same number of requests.

          1. 2

            Straight conversion to async won’t help with the response speed, but as soon as you have two databases or a db and cache, you can get latency improvement from querying in parallel. Also if you’re db bound, you can pack more async requests than threads on the same runtime. (usually, I know there will be exceptions)

            1. 1

              And… switching to async everywhere didn’t offer much gain. Which makes a lot of sense: if you’re already at the point where your DB is measurably holding things up, you don’t really gain by accepting even more pending requests that you can’t serve yet. The real performance boost, as always, came with introducing caching (in the form of Redis), which provided a multi-hundred-percentage-point improvement.

              That’s an excellent point. I think a lot of people see Python’s default synchronous nature including the GIL to be a critical shortfall, and in some cases it is, but in some cases it’s not, and there may be more “it’s not” than people think :)

              1. 1

                I have not a lot of experience here but for what I get from your point of view/feedback is that the RedisCache introduced in the 4.0 version of Django should have already a big enough impact (Releases Note Django 4.0) and the async interface is mostly a bonus?

                1. 3

                  Django has had the caching framework since the beginning, and has always shipped several backends that can talk to different caching data stores. Back in the day Memcached was the popular one; more recently Redis has been gaining, and a backend for using Redis as a cache was added in Django 4.0 (after a third-party one had been available for several years), but that wasn’t the beginning of caching as a concept in Django.

              2. 3

                I’ve worked with Django for a few years, and IIRC we would just spin up multiple gunicorn workers, each running their own Django process, which has its own DB connection. So there really wouldn’t be a bottleneck.

                It might be useful in cron tasks or other tasks started from the CLI. We’ve used threads for this in the past (making a new db connection per thread), but possibly async could be faster there (and less issues with resources).