1. 29
  1.  

  2. 21

    There is just such a huge ergonomics hit. On top of that, any time I’ve compared benchmarks between async and threaded code with the cpu frequency stabilized for repeatable results, async has usually resulted in lower throughput and only in unrealistically low-cpu workloads have I sometimes measured latency improvements. Far worse ergonomics, more error prone code, worse compiler inference, tons of dependencies, etc… for approximately equal or worse throughput and latency.

    Have other people run responsibly controlled benchmarks that show significant throughput improvements on modern server operating systems when using async? It’s kind of weird to me that people will go through all of this pain because some random person on the internet told them it was better, but it doesn’t seem like many people have seriously evaluated the costs and benefits.

    If you want, try out this echo example after disabling turbo boost and see for yourself:

    # build with turbo boost
    cargo build --release --bins
    
    # disable turbo boost for repeatable results
    echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
    
    # start threaded and async servers
    cargo run --release --bin req_res_threads & # starts on port 7000
    cargo run --release --bin req_res_async & # starts on port 7001
    
    # see how long it takes to receive 100k echo round trips of 4k buffers from 10 concurrent clients
    time cargo run --release --bin req_res_sender -- 7000 10 # "bad" thread per client
    time cargo run --release --bin req_res_sender -- 7001 10 # async
    
    # re-enable turbo boost for more enjoyable computing
    echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
    

    On my linux servers, threads tend to beat async throughput by 5-20%. Threaded version is doing new thread per client. Async version is using a multi-threaded work-stealing executor.

    It’s interesting to strace the async version to see how many more syscalls can be generated when hammering on epoll.

    Anyway, can we please start measuring more? So much effort is being spent for negative gains :/

    1. 16

      It’s kind of weird to me that people will go through all of this pain because some random person on the internet told them it was better, but it doesn’t seem like many people have seriously evaluated the costs and benefits.

      It is fascinating how much damage early 2000’s “threads = slow, async = fast” FUD can do. Almost all of it doesn’t apply anymore, anyway. It seems like this meme has somehow been imparted into the collective programmer (un)conscious, and it seems impossible to root out at this point. Remember Node.js being marketed as “everything is async, therefore very fast”?

      Even today, somehow, being async is revered as a great feature. The very reason for “being async” seems to have long been forgotten by the general public. I have first hand experience with this phenomenon from teaching people Go. When they start learning, at some point, they ask the spooky question: “Does this call block a thread? I heard blocking is slow.”, and a wise Go sage promptly answers: “Fear not! The Go runtime actually uses async under the hood!”, and just like that, the pupil’s worries about performance are gone! Poof! The async magic sauce makes everything go fast, so if it’s under the hood, we need not worry about performance at all.

      In fact, I believe “being async” and exposing such primitives is an enormous disadvantage, in every single department except perhaps performance, and, even in that case, there exist multiple facets, as the benchmark linked in the parent post shows. Look at the absolutely remarkable amount of code (and the complexity thereof) the author of the blog post has to write towards the end of the article, in order to do something that is, conceptually, very simple[1]. And what have we gained from this, in the real world? A few less MBs of memory sitting around unused per connection in a server?

      I’m not buying what the async people are selling. The async programming model is strictly worse. Async code does not compose well with “regular” code, and can be difficult to reason about. Writing and debugging the reactors is difficult, especially if you want a cross-platform one, since each platform (epoll, kqueue, event ports, IOCP) has its own quirks. Having visibility into these runtimes requires a tremendous amount of extra work, because none of the existing tools understand them. Meanwhile, a thread / process, and calling read(2) is just about the same everywhere. But that would be too easy, wouldn’t it?

      When I look at much of what modern software development is like, I can’t help but feel like

      Anyway, can we please start measuring more? So much effort is being spent for negative gains :/

      alludes to a much deeper problem. You see it in many places, and the async cargo culting I tried to point out with this post is just one of them.

      [1] Here it is, in 8 lines of Go, and another ~20 for a complete working demo https://play.golang.org/p/B-ZmhxNIYPb

      1. 4

        In general, as someone who has worked a lot with servers in a language that doesn’t have async, Python[1], I will say that async programming is worse in many cases. I would not use it for anything outside the web domain. Unfortunately, at least in my career, web servers have eaten the world, and it turns out that async helps a lot there. With Python servers, it turns out that frequently they are only able to effectively make use of around 40% of a typical cloud VM[2]. If you start to get into the 60% CPU range performance quickly degrades. This also comports with my experience of Ruby at a previous job. Note that I’m also ignoring the absolutely astounding growth rate of memory usage resulting from a typical Python or Ruby code base that effectively limits the number of worker threads that you can actually run.

        Now, most of this stuff where someone writes an ETL or some other trivial thing? Yeah, just use threads and traditional concurrency primitives. I’ll note though that the article here is literally a toy example to demonstrate the trait implementations and compiler errors, not an example of best practices.

        • [1]: I do know about Python async “stuff,” but for all intents and purposes Python does not have async.
        • [2]: Say, a c5.large on AWS.
        1. 2

          You do not need to expose an async interface to avoid the high memory consumption that a Python/Ruby system has though. Go and Erlang/Elixir are two fine examples; there are plenty of others.

          A multiprocess single-threaded synchronous dynamically-typed interpreted GC’d language is worst case for memory in a high-concurrency environment; it’s just one overhead after another. Python and Ruby are both technological dead ends in web dev for the reasons you pointed out.

          I suspect I’m drifting off from the topic at hand, however…

      2. 8

        I want to upvote you even more than I can. I think its sad that async took so much mindshare. there are few if any good http servers, database clients and other libs that don’t depend on tokio or some other runtime.

      3. 6

        For those who see this post and get discouraged about the complexity of async rust - don’t give up. This is not a great example of async rust at its most elegant. Maybe that’s because of async-std, I don’t know as I’ve never used it. I have used tokio 0.2, which is great. Instead of reinventing std as async I think it hits a sweet spot - it’s easy to spawn a new thread in which to do blocking IO and then await it using async syntax and std primitives. There are some macros that make life easy there as well.

        Of course there are gotchas and some rough edges, for example - if your program holds a mutable reference over an await point you’ll get a pretty cryptic error. There’s still work being done. But if you wrap your mutable things in Arc<Mutex<_>> and tend to write functional code then it can be easy and simple. Go has no comparison. It simply lacks the types to get the job done. Go’s syntax for multithreaded channels look promising on the outside but there’s just not enough type checking for me to be confident that the program is doing what I think it is.

        Also keep in mind that there are certain programming domains that are necessarily async, like the browser! One of rust’s design goals is to provide the journeyman rustacean with as many viable programming contexts as possible - server, desktop, microcontroller, browser, lambda, etc.

        I’ve been very pleased with rust and async using tokio and js-futures. It’s not all hype, there are real productivity boosts to be had.