1. 7

    I’m gonna pass over the problems of the methodology that have been pointed out before and stop right right at the title: “Async Python is not faster”. The click-baitiness aside, it demonstrates a prevalent misconception that – as a card-carrying async aficionado – I’ve always found problematic.

    Asynchronous IO is about one thing only: a better usage of resources. Never has anyone claimed that your code will get magically faster by switching from preemptive multitasking to cooperative multitasking. The promise was always that if you get 1,000 instead of 10 simultaneous connections, your code will slow down linearly and not fall over. And in the case of higher-level languages like Python, you also get better ergonomics through explicit concurrency and nicer APIs (lol epoll, BTDT).

    So if you want to serve many clients – some of which having huge latencies – at once? Use async. You want to run I/O-bound code (doesn’t have to be network – black achieves some great feats with asyncio) concurrently with nice APIs? Use async. Long lived connections like websockets in a GIL-ed runtime? Use async.

    But if all you do is getting data from a database and serve it by HTTP? You’re gaining nothing and you’re paying by having to sprinkle your code with async and await for no measurable gain. You also lose all benefits of database drivers and SQLAlchemy using C to release the GIL. So yeah, even if the benchmarks weren’t flawed: they don’t matter if you choose the right tools for the job.

    The fact that async is a poor match for this job, does not mean it’s bad for the jobs that it was built for.

    1. 0

      Never has anyone claimed that your code will get magically faster by switching from preemptive multitasking to cooperative multitasking

      If only - performance claims from async web frameworks are in fact extremely common. I covered Vibora in the article. Starlette and Sanic both also make prominent claims in their documentation vs alternatives (which, let’s be honest, are Django and Flask). These claims were not proved out in my tests.

      The promise was always that if you get 1,000 instead of 10 simultaneous connections, your code will slow down linearly

      This promise seems very dubious because in fact I found, both in the benchmark and out of it, that the async frameworks dealt extremely poorly with high load.

      Moreover I think the topic of dealing with connections rather than requests is in my opinion moot because very few people terminate a TCP connection that has arrived over the internet with their Python program. Amazon’s ELB, HAProxy, nginx etc are used for that. And then of course there is the question of whether defining an autoscaling group is a more appropriate solution for this worry than writing your application in a special way.

      So if you want to serve many clients – some of which having huge latencies – at once? Use async. You want to run I/O-bound code (doesn’t have to be network – black achieves some great feats with asyncio) concurrently with nice APIs? Use async. Long lived connections like websockets in a GIL-ed runtime? Use async.

      I think some of this is fine as far as it goes. Using asyncio for an websocket service makes intuitive sense to me and especially if you avoid doing any CPU work I think that will probably work fine. However this is not as far as it goes - there is a profusion of general purpose web frameworks and other tools that are clearly intended to do much more than just TCP connection management. That is the problem.

      1. 4

        performance claims from async web frameworks are in fact extremely common

        The boisterous claims of async frameworks have irked me for a long time, however the title of your post is not “async frameworks make misleading claims about performance” or “simple web apps don’t need async” but “Python async is not faster”. The irony of quoting NJS, an author of an async framework, as an argument against async in general has been already pointed out too.

        It seems like to me you were disappointed by the characteristics of a web app running asynchronously, drew the wrong conclusions, and quoted any material that remotely seemed to confirm your case – not even shying away from pulling gevent into the mix.

        I can assure you that watching people froth over async as the silver bullet for anything is just as frustrating from my end however I’m afraid you took the wrong turn. I’d suggest to have a look what the original promises were and what good fits for that are – the comments here should have given you a few good pointers. If you judge a technology by what excited kids push to GitHub or Medium and compare it to reality, you’re always gonna be disappointed.

        This promise seems very dubious because in fact I found, both in the benchmark and out of it, that the async frameworks dealt extremely poorly with high load.

        Yes, but the problems with your benchmark have been discussed elsewhere so no need to reiterate them here.

        Moreover I think the topic of dealing with connections rather than requests is in my opinion moot because very few people terminate a TCP connection that has arrived over the internet with their Python program. Amazon’s ELB, HAProxy, nginx etc are used for that.

        That is only true for short-lived, stateless HTTP requests which async is only a mediocre fit for. Only few people will argue this point. However there’s many more types of connections, the most common one probably being web sockets and good luck handling them with a sync framework with more then ten clients. But I can assure you there are many more and I wouldn’t want to miss async networking when dealing with them.

        1. 0

          Originally you claimed that

          Never has anyone claimed that your code will get magically faster by switching from preemptive multitasking to cooperative multitasking

          And now you admit that

          The boisterous claims of async frameworks have irked me for a long time

          I think you are right the second time and this is my feeling too.

          Re:NJS - for what it’s worth I didn’t quote him but I wouldn’t feel bad if I did. I don’t think it’s wrong to surmise from the progression of asyncio -> curio -> trio that async is difficult. I am not “out to get” async but I do strongly dislike the chronic over application of it - which, it sounds to me, you also recognise as a problem.

          1. 3

            I think the problem here is that I was talking about the people that built async APIs (epoll/kqueue/…) and low-level frameworks (asyncio, Twisted, trio, …) and you about applications/framework that build on it (not gonna name them to avoid unnecessary shaming).

            I absolutely see the problem of its misapplication, which is why I didn’t argue about the benchmarks at all: I don’t find them interesting for that use case because the use case isn’t interesting.

            But I also don’t see how your post is conveying that point neither from reading it myself nor from the reception it got.

    1. 4

      I’d generally this a bit more and say “you should treat tests like any other code”. This includes making sure it’s sure it’s understandable (using e.g. documentation), but also includes things like running your linters, making sure it can be understood easily, and generally just making sure it doesn’t become this kludge I’ve seen far too often.

      1. 5

        So, I generally agree with this, but there are some important differences.

        For instance: Test code is not usually itself tested. If you fail to hook up a feature correctly, users will notice it isn’t there; but if you fail to hook up the test code correctly, it’ll never fail.

        As a result, it’s valuable to keep test code very simple, even if doing so generates verbosity. Multiple times I’ve discovered tests that are either not running at all or not executing any checks, because some clever abstraction was introduced to make them easy to write.

        1. 2

          I’ve actually joked that some tests need tests themselves.

          Yeah, I agree tests should be very simple; overcomplicated tests and testing frameworks are one of the few things I have very strong negative opinions on. Not just because what you’re saying, but also because when writing tests you need to think of both the testing code and the code being tested. Reducing the cognitive load here really helps in my experience.

          1. 2

            when writing tests you need to think of both the testing code and the code being tested

            Conversely, thinking about both the code itself and how you will test it when you write the code can help you to end up with more testable code which avoids the need for overcomplicated tests.

            1. 2

              A test by itself doesn’t have to be over complicated to need an explanation. The connection between “what am I testing” and “how do I verify I have achieved it” can and often is though.

              1. 1

                A test by itself doesn’t have to be over complicated to need an explanation

                Sorry. I didn’t mean my comment as “If you write testable code, your tests will be simple enough to be self documenting”, although given the context, I now see that it could be read as such. While it might be true that testable code leading to simpler tests might let you get away with poorer documentation, I wouldn’t encourage it. I agree with the idea that you should document tests.

                As I see it, the two issues (how simple your tests are vs. how well documented your tests are) are largely orthogonal.

              2. 1

                Perhaps; but it’s still a lot more to keep in your head.

        1. 6

          I don’t like these “you must add comments” policies as more often than not it leads to a lot of noise and duplicate or obsolete information. The test should be well written enough that what it does is obvious and, just like for regular code, comments should be added only when something may be unclear.

          Also I don’t know about Python but, with most JavaScript test unit libraries, the test will be described directly in code:

          it('should find a character in a string', () => {
              expect('abcd'.indexOf('b')).toBe(1);
          }) 
          

          This is useful because that string, unlike comments, would show up when a test doesn’t pass.

          But the intent can’t always be summed up in a few words

          It implies that it can often be summed up in a few words, which in turns implies that most of the time comments would be useless.

          1. 9

            it('should find a character in a string'

            This is pretty much exactly what the article is arguing for. The rest is just a matter of syntax and tooling features, so why the contrarianism? There’s an entire paragraph talking about avoiding the kind of bad comments you mention.

            This is useful because that string, unlike comments, would show up when a test doesn’t pass.

            Ironically, Python’s unittest module (the one in the standard library; nowadays most people use pytest fortunately) does that with doctests which made CPython Core ban the usage of doctests because they found that confusing.

            It implies that it can often be summed up in a few words, which in turns implies that most of the time comments would be useless.

            This wildly depends on the project and type of code. It’s also really difficult to judge right now – when the code is fresh on your mind – what will confuse you in a year. At least that was my experience.

          1. 2

            Random 16 bit numbers that are prefixed by the type and that can be translated offline to private IP addresses:

            Eg c-1000 is a container whose IP address ends with 10.0. So if 10.1.0.0/16 is the network for containers, this one’s main address would be 10.1.10.0.

            1. -1

              The best SRE recommendation around Memcached is not to use it at all:

              • it’s pretty much abandonware at this point
              • there is no built-in clustering or any of the HA features that you need for reliability

              Don’t use memcached, use redis instead.

              (I do SRE and systems architecture)

              1. 30

                … there was literally a release yesterday, and the project is currently sponsored by a little company called …[checks notes]…. Netflix.

                Does it do everything Redis does? No. Sometimes having simpler services is a good thing.

                1. 11

                  SRE here. Memcached is great. Redis is great too.

                  HA has a price (Leader election, tested failover, etc). It’s an antipattern to use HA for your cache.

                  1. 9

                    Memcached is definitely not abandonware. It’s a mature project with a narrow scope. It excels at what it does. It’s just not as feature rich as something like Redis. The HA story is usually provided by smart proxies (twemcache and others).

                    1. 8

                      It’s designed to be a cache, it doesn’t need an HA story. You run many many nodes of it and rely on consistent hashing to scale the cluster. For this, it’s unbelievably good and just works.

                      1. 3

                        seems like hazelcast is the successor of memcached https://hazelcast.com/use-cases/memcached-upgrade/

                        1. 3

                          I would put it with a little bit more nuance: if you have already Redis in production (which is quite common), there is little reason to add memcached too and add complexity/new software you may have not as much experience with.

                          1. 1

                            this comment is ridiculous

                            1. 1

                              it’s pretty much abandonware at this point

                              i was under the impression that facebook uses it extensively, i guess redis it is.

                              1. 10

                                Many large tech companies, including Facebook, use Memcached. Some even use both Memcached and Redis: Memcached as a cache, and Redis for its complex data structures and persistence.

                                Memcached is faster than Redis on a per-node basis, because Redis is single-threaded and Memcached isn’t. You also don’t need “built-in clustering” for Memcached; most languages have a consistent hashing library that makes running a cluster of Memcacheds relatively simple.

                                If you want a simple-to-operate, in-memory LRU cache, Memcached is the best there is. It has very few features, but for the features it has, they’re better than the competition.

                                1. 1

                                  Most folks run multiple Redis per node (cpu minus one is pretty common) just as an FYI so the the “single process thing” is probably moot.

                                  1. 5

                                    N-1 processes is better than nothing but it doesn’t usually compete with multithreading within a single process, since there can be overhead costs. I don’t have public benchmarks for Memcached vs Redis specifically, but at a previous employer we did internally benchmark the two (since we used both, and it would be in some senses simpler to just use Redis) and Redis had higher latency and lower throughput.

                                    1. 2

                                      Yup. Totally. I just didn’t want people to think that there’s all of these idle CPUs sitting out there. Super easy to multiplex across em.

                                      Once you started wanting to do more complex things / structures / caching policies then it may make sense to redis

                                      1. 1

                                        Yeah agreed, and I don’t mean to hate on Redis — if you want to do operations on distributed data structures, Redis is quite good; it also has some degree of persistence, and so cache warming stops being as much of a problem. And it’s still very fast compared to most things, it’s just hard to beat Memcached at the (comparatively few) operations it supports since it’s so simple.

                            1. 7

                              I’m currently a Python dev (apparently this is the most recent turn my career has taken), and I’m really bummed out by its web story outside of Django.

                              My last gig was Elixir, before that Node, and some Rails and Laravel in there. The tooling in the Python ecosystem, especially around migrations and dependency management, just feels clunky.

                              It singlehandedly sold me on Docker just so I didn’t have to mess with virtualenvs and multiple runtimes on my system and all of that. Like, what happened? Everybody groused about 2-to-3 (which is still hilarious) but like even without that I feel like the ecosystem has been vastly outstripped by “worse” technologies (see also, NodeJS).

                              1. 4

                                It singlehandedly sold me on Docker just so I didn’t have to mess with virtualenvs

                                One thing that made virtualenvs almost entirely painless for me was using direnv: in all my python project directories I have a bash script named .envrc that contains source .venv/bin/activate, and now cd-ing in/out of that directory will enter/exit the virtualenv automatically and instantaneously. It’s probably possible to set it up to switch pyenv environments as well.

                                1. 3

                                  One of the reasons why Python packaging still feels so clunky compared to other ecosystems is that the Python ecosystem is a lot more diverse thanks to e.g. the scientific stack that has very different needs than the web peeps so there’s never gonna be an all-encompassing solution like Cargo. Pipenv tried and failed, poetry is carving a niche for itself.

                                  But the primitives are improving. pip is currently growing a proper resolver and doesn’t e.g. Ruby still need a compiler to install binary packages? As long as long as you don’t use Alpine for your Docker images, Python’s wheels are great (they’re just a bit painful to build).

                                  1. 1

                                    How did pipenv fail?

                                    1. 4

                                      Short answer: it’s too complex which makes it buggy and there wasn’t a release in over a year. IOW: It’s falling over it’s own weight.

                                      Long answer: https://hynek.me/articles/python-app-deps-2018/

                                  2. 3

                                    The tooling in the Python ecosystem, especially around migrations and dependency management, just feels clunky.

                                    Currently working on a Rails app, coming from the Flask ecosystem. You have no idea how much I can miss SQLAlchemy and Alembic.

                                    I agree about dependency management, but certainly not about migrations. Modifying models and auto-generating migrations works much better than the other way around for me.

                                  1. 3

                                    LOVE this article! Especially the offer to help folks in need to get more visibility. Good on you for being willing to put elbow grease into moving the needle!

                                    A question about your reactions to people writing about building in the cloud. Are you saying you’d like to see less of that, or that you think doing so is a bad idea to begin with?

                                    I ask because the former seems a perfectly reasonable preference, but I’d argue that the latter could be a reactionary stance we might think carefully before taking.

                                    The cloud is a GREAT tool for certain use cases and an AWFUL one for others. I’d love to see some of the hype and acrimony get stripped away so we could all just use the right tool for the right job and get on with our lives :)

                                    Your chances of me helping are increased if you’re part of an URM and/or if I find your topic interesting. Please accept my apology if I can’t help you specifically, but I’ll try to find time for as many people as my time permits.

                                    Just in case anyone else read this and feels too abashed to ask: URM is an acronym for Under Represeented Minority.

                                    1. 1

                                      A question about your reactions to people writing about building in the cloud. Are you saying you’d like to see less of that, or that you think doing so is a bad idea to begin with?

                                      Not at all! I’m just somewhat annoyed by the fact that given a lot of public discourse is dominated by paid cloud advocates you could get the impression, that everyone is is running their stuff in their clouds, on top of their products like hosted Kubernetes.

                                      That’s obviously wrong but they’re paid to give you that impression and I suspect that many people feel inadequate due to that despite having good reasons to run their stuff differently. We need to get those people too – not asking for exclusivity. :)

                                      1. 1

                                        I think you’re right, and I also think some people end up feeling like they SHOULD run their workloads in the cloud even when maybe doing an actual evaluation of their situation might serve them better. Heck, a cloud solution may well be exactly the right fit, but you won’t know until you really survey all the options and figure out what works best for you.

                                    1. 2

                                      We are one of those Python web services companies running on Docker (AWS Fargate). One of our big problems has been dependency management in our monorepo. We want to be able to build and deploy to production quickly and often (more than a dozen times per day) so we want our CI job to run in ~10 minutes or less and our deploys to run in ~30 minutes or less. We also care about reproducible builds, so we initially looked at pipenv, but it took 30 minutes just to resolve dependencies for any change in any container. Eventually we moved to https://github.com/pantsbuild/pants which has solved many problems, but it’s an awful piece of engineering that happens to do the job right so long as you never deviate from the happy path and you don’t need to do reasonable things like ask, “what is the unique hash of this version of my system? [so I can use that to tag my Docker images]”. In general, dependency management and ecosystem tooling is still a big pain and we spent a lot of time to find something that worked only passably. Others have probably found a solution that works well, but we haven’t stumbled upon it yet (possibly because there isn’t enough attention devoted to running web services among the Python community?).

                                      1. 9

                                        Without knowing your details, I can’t give advice, but as a data point: I’ve personally found joy in the flexibility of pip-tools. I’ve blogged about it (well, I mostly blogged about why I don’t use Pipenv and poetry) in 2018 and updated it November 2019 with my personal workflow: https://hynek.me/articles/python-app-deps-2018/

                                        1. 2

                                          I haven’t even heard of pip-tools. Reading through your blog post now. Thanks for the recommendation!

                                      1. 3

                                        That’s really cool. Btw, I`ve heard about their plans to remove or significantly modify the GIL logic after 3.8.0, so that there would be a distinct interpreter lock for each thread. How is it going so far?

                                        1. 4

                                          Do you mean subinterpreters? Eric talked about it recently on a podcast: https://talkpython.fm/episodes/show/225/can-subinterpreters-free-us-from-python-s-gil

                                          3.8’s multiprocessing.shared_memory is almost as interesting because it allows “free” one-way communication between processes.

                                        1. 2

                                          I agree that macOS has arrived at a complexity that Apple apparently can’t handle anymore (which is very different to Linux’s problem in 2000, I’ve been there).

                                          But I’m gonna leave here that if you have problems with projectors, it’s probably the fault of the USB-C to HDMI converters. Which is most probably caused by USB-C/TB3 being a shit show so far. You can put it on Apple that they miscalculated the trajectory of USB-C but I’ve seen other notebooks fail and honestly MacBooks still seem to make the least problems at conferences. They got the same shit for dropping disk drives and going all in on USB-A and it worked fine. There had to come a miss (no I don’t consider dropping 3.5mm a miss, it’s a mixed bag at best). 🤷‍♂️

                                          FWIW I have a 2018 MBP with the Belkin USB-C to HDMI converter sold directly from Apple and have spoken at conferences on three continents and I’ve had zero problems so far. Which is why I always travel with my own HDMI and VGA adapters to conferences. And I have already saved other speakers their butts with them too, so I can very much recommend that – it might help you making friends. :)

                                          1. 2

                                            …but ask yourself why that USB-HDMI dongle is needed in the first place. I don’t need one because several laptops around here have a native HDMI interface.

                                            When form goes over function, function is lost.

                                            1. 2

                                              That’s not a uniquely Apple problem though. Hard to say if it’s caused by others aping Apple or whether it’s a natural progression, but the “naked robotic core” seems to be an ideal that is favored generally.

                                              USB-C/TB3 is as far I can say one of the biggest consumer-hostile failures of the tech industry in the last years: no good hubs, wonky dongles, five different cables that look the same but do different things. But that’s not on Apple (alone).

                                              1. 1

                                                The last Macbook pro with an HDMI was circa 2015 I think, or that last year before Apple decided that a terrible keyboard and a single USB-C port ought to be enough for anybody. :p

                                                1. 2

                                                  Ironically, my current USB-C to HDMI dongle is more reliable than my 2014’s built-in HDMI port. At some point I started carrying an extra dongle to be sure too. ¯\_(ツ)_/¯ (Having to present from a stranger’s notebook is one of the biggest nightmares of most speakers.)

                                            1. 5

                                              My shell prompt is quite involved, but I’m using fish, so I’m not gonna paste the whole code here.

                                              But I’d like to share one change in approach, that I found a game changer:

                                              Make it two lines.

                                              The more stuff you put into your prompt, the longer it gets, so what I do is that I have one line with the path, git info etc and the second line just starts with . So my actual prompt is always on the very left, no matter how long the path. Which also means I don’t have to shorten the path or other things I’ve seen people do.

                                              For example right now, it looks like this:

                                              ~/Work/secret-project on master|→25!11?1
                                              [2] ➤
                                              

                                              The git status on the top right means: 25 files staged, 11 changed, 1 untracked. The [2] is the return code of the last execution.

                                              Of course there’s lots of colors. :)

                                              1. 3

                                                That’s also my approach to command line prompt, with the difference that I use Bash (but recently I have been exploring fish as a daily driver as well). I can’t recommend it enough, especially given the fact that current monitors allow for giving up one line like that.

                                                1. 1

                                                  Similar here in fish:

                                                  ~/Code/somerepo master
                                                  ➫ 
                                                  

                                                  I only did a few small things after switching to fish:

                                                  ~/.config/fish/fish_variables (Set Vi mode)

                                                  SETUVAR fish_key_bindings:fish_vi_key_bindings
                                                  

                                                  ~/.config/fish/config.fish (ctrl-f to accept autocomplete suggestions in Vi mode)

                                                  bind -M insert \cf accept-autosuggestion
                                                  

                                                  Install pure

                                                  ~/.config/fish/conf.d/pure.fish:

                                                  _pure_set_default pure_symbol_prompt "➫"
                                                  _pure_set_default pure_symbol_reverse_prompt "➬"  # Shown in Vi edit mode
                                                  
                                                1. 7

                                                  I wrote a blog post on deploying Python applications back in 2012(!) and implementation details aside, it aged quite well and we’re using the same rough approach for everything that’s not running in our nomad cluster for some reason: https://hynek.me/articles/python-app-deployment-with-native-packages/

                                                  The core idea is to build a virtualenv with everything you need on a build server/CI and package it into a .deb/.rpm./.tar.gz. Configuration goes into ansible, done. Gives you unlimited flexibility, rollbacks, and a whole battle-tested toolchain that you can rely on.

                                                  The concepts should be easily transferable to other languages and ecosystems.

                                                  1. 2

                                                    Great article, thanks!

                                                  1. 31

                                                    It is clear what Mozilla needs to do: […] Also Mozilla can take real responsibility and work together with the Internet community and create RFCs for make DHCPv4, DHCPv6 and Router Advertisements support DNS URLs instead of just IP addresses. Mozilla could also help developing support in the operating systems, if privacy was really a concern for Mozilla.

                                                    While I share some of the concerns, the belief that this is a viable alternative is whole crux of the article.

                                                    Similarly to IPv6, not enough people are feeling any pain – or even inconvenience. Or similarly like we had to live with a catastrophic TLS stack (remember OpenSSL 0.9.8?) until heart bleed happened, because nobody cared except for cryptographers yelling into the void.

                                                    In my eyes Mozilla is trailblazing here in the hopes that others will follow once there’s a critical mass/adoption. And opinions about CloudFlare (at least it’s not an ad company) aside, it’s an approach that worked before so it might not be the worst idea. I’m sure we’ll get an independent DoH provider eventually if the tech sticks.

                                                    Also as others have pointed out: browsers nowadays already are special. They use their own DNS caching, TLS libraries, and often trust databases. They are the main mean for most people to interact with the internet therefore they’ll always have that special status.