The Original Sin of Rust async programming is making it multi-threaded by default. If premature optimization is the root of all evil, this is the mother of all premature optimizations, and it curses all your code with the unholy Send + ’static, or worse yet Send + Sync + ’static, which just kills all the joy of actually writing Rust.
I think virtually everyone:
a) Would complain if it weren’t multithreaded - hence why everyone enables multithreaded runtimes
b) Is absolutely fine with the tradeoff of typing a dozen or so extra characters on some trait bounds
The problem, of course, is that Tokio imposes this design on you. It’s not your choice to make.
Of course it is. You don’t have to use Arc, you can just copy your data or move it. You don’t have to use spawn, Rust lets you borrow data across .await points and pass references into async functions. It’s pretty awesome. You can use spawn_local. You can use a channel. Or, gasp, you can Arc it. Mutex is only needed for shared mutable state - I seriously almost never use it, but idk, it’s not that big of a deal?
However, I have little hope that the Rust community will change course at this point. Tokio’s roots run deep within the ecosystem and it feels like for better or worse we’re stuck with it.
Yeah, because almost no one actually wants to make the tradeoffs you’re talking about. I get that you care about this, but I don’t and I think most people don’t - most people are fine with the status quo of tokio.
In other words, the difference is negligible for most applications.
Couple things.
Async is more than performance. Async allows you to trivially write code in a way that the code can be scheduled cooperatively, such as allowing a json-lines parser to yield back control to the caller every N lines. Async allows you to trivially write “this request, given these parameters, should time out after N seconds” without exposing sockets to every function that might make a request. These aren’t small things IMO.
Async is more than latency. OS threads also take up memory. OS threads are amazing, don’t get me wrong - I’ve been screaming “OS threads are amazing for performance, you probably don’t need async for performance” for years, but I think it’s notable that there are other benefits in terms of performance than just latency.
Yeah some of us do need that performance :)
Instead, I would recommend to use async Rust only when you really need it. Learn how to write good synchronous Rust first and then, if necessary, transition to async Rust. Learn to walk before you run.
On this I certainly agree. Trying to over async-ify things is a bad idea, almost tautologically. Use it as needed. Don’t be afraid to use a thread (or spawn_blocking) when appropriate.
What makes you think most people would want it to be multithreaded? Node.js has a single-threaded event-loop, and it works just fine. Concurrency does not imply parallelism.
What’s great about their single-threaded runtime is that it doesn’t require any locking mechanisms.
It’s not about adding a dozen or so extra characters, either. Those trait bounds are there for a reason. Every trait bound is a limitation on the types you can use in that section. Not all types are Send + Sync + 'static, so you end up wrapping them in an Arc<Mutex<T>> and passing them around like that throughout your application. That decision can have a severe impact on the design of the whole system as trait bounds tend to spread across the call-chain.
You don’t have to use Arc, you can just copy your data or move it.
“Just copying your data or moving it” is a simplification here. While you can move data into a task, it must meet the aforementioned trait bounds. Because of these bounds, you often have to resort to using synchronization primitives. Simply “moving” is not an option because of the Sync bound.
Also, copying data might not always be feasible or efficient, especially for large or non-Copy types.
Using spawn_local or a local smol runtime is great and more people should know about it; but it does limit the libraries or tools you can use in your project. Many async libraries and dependencies are written with the assumption that tasks can be Send.
Yeah, because almost no one actually wants to make the tradeoffs you’re talking about. I get that you care about this, but I don’t and I think most people don’t - most people are fine with the status quo of tokio.
I don’t think most people know about these tradeoffs in the first place. If they were aware of the long-term implications of that decision on their project and the additional complexity that comes with this ecosystem, they would be more cautious. We should avoid drawing premature conclusions here. People might be fine with Tokio, but only because of a lack of viable alternatives or because they equate async Rust with performance, and so they blindly add it to their project without even thinking about the consequences.
Async allows you to trivially write code in a way that the code can be scheduled cooperatively
Async also allows you to trivially shoot yourself in the foot by forgetting to poll a future (which causes it to simply not run) or blocking a thread on a long-running task. Used in the cases you mentioned, I fully agree that it has its benefits, as long as async usage is limited in scope to a small part of the application. No need for #[tokio::main] for most cases.
I actually think we both fundamentally agree in that async is a powerful weapon to have – if used in moderation and for the right problems. As the author of lychee, I know that it has great benefits for performance, but it comes at a significant cost.
What makes you think most people would want it to be multithreaded?
The fact that the most popular runtime, by far, is multithreaded, despite there being many alternatives at the time when async was first kicking into gear.
Node.js has a single-threaded event-loop, and it works just fine.
Virtually no one in Rust is coming from Node thinking “Well, Node was fast enough for me”.
Concurrency does not imply parallelism.
No one said otherwise.
What’s great about their single-threaded runtime is that it doesn’t require any locking mechanisms.
No one requires locking unless you’re mutating state across threads. You can just not do that - Rust makes it super easy because mutability is something you can understand at compile time. I’ve almost never needed a mutex.
Not all types are Send + Sync + ’static, so you end up wrapping them in an Arc<Mutex> and passing them around like that throughout your application.
The vast majority of types can be sent across threads simply by moving them. Satisfying those bounds is rarely a problem for me. You do not need Mutex unless you need to mutate your state across threads. If you need to mutate state, ok, use a Mutex, or just move the data.
While you can move data into a task, it must meet the aforementioned trait bounds.
Note that none of those types involve a Mutex, but I did throw an Arc in there for completeness.
Moving is an extremely easy way to satisfy these bounds for many types - the only case where that’s tricky is if you have references, which makes 'static hard to satisfy; to that, either clone the value or use a local task.
Also, copying data might not always be feasible or efficient, especially for large or non-Copy types.
So then Arc them or borrow them.
I don’t think most people know about these tradeoffs in the first place.
Are all of the people using tokio just not noticing? Are they ignorant? Or are they just… fine with it.
People might be fine with Tokio, but only because of a lack of viable alternatives
Tokio is the main runtime now and it was always an early contender since the early days of mio. But there have absolutely been alternatives, certainly they were there when async was starting to really become popular. People chose tokio.
Maybe they were just fools who blindly wanted multithreading, or maybe they just prefer it.
Async also allows you to trivially shoot yourself in the foot by forgetting to poll a future (which causes it to simply not run) or blocking a thread on a long-running task.
Yep. Although there’s a lint on by default for not polling a future, but you can store the future and forget to, certainly. Blocking can be an issue as well. Async is not perfect. FWIW it’s trivial to block an OS thread too, perhaps even moreso.
at a significant cost.
I think the only thing we are likely to disagree about is how significant that cost is. I think it’s minor, some people think it’s massive. A fun thing that we can agree on so much and yet draw different conclusions :)
What makes you think most people would want it to be multithreaded? Node.js has a single-threaded event-loop, and it works just fine. Concurrency does not imply parallelism. What’s great about their single-threaded runtime is that it doesn’t require any locking mechanisms.
Everything depends on context, of course. But at least in the context of request-processing services – e.g. anything that provides an HTTP server, or consumes events from a queue – both concurrency and parallelism are essentially table-stakes requirements. That is, fully utilizing every core on an N-core machine should be possible with a single process, it shouldn’t require N processes.
nginx (for example) “requires” N processes and fully utilizes every core on an N-core machine, arguably better than a tokio-based solution given it doesn’t have to deal with the inefficiencies of rust-async or the synchronization costs of work-stealing.
Sure, and the same is true of many other programs, like apache, or Unicorn (Ruby), etc. etc. But this model of parallelism is anachronistic, especially for application services, is, I guess, my point.
The answer to complexity of async is typically “oh, just use threads, they’re nice and simple!”
https://lib.rs started as a synchronous batch process, with sync I/O, and lots of rayon, and this setup did not work well for a server:
In a mixed workload of I/O and CPU-bound tasks, it’s hard to have full utilization of the CPU, and not overload network at the same time. Waiting threads “waste” the processing power of the threadpool. Making threadpools oversized is a mixed bag, because it can still have too many I/O-waiting jobs, and overload the network too.
Async has spawn_blocking, semaphores, and concurrent streams to manage workloads more precisely. Sync equivalents of these things don’t unblock their threads, unless you reinvent some callback-based async substitute.
Use of locks in a threadpool is very risky (this includes blocking channels). If a task happening under a lock is dependent on another thread in the same threadpool, it is likely to deadlock. In a large project it becomes a challenge to keep separate threadpools with a strict hierarchy to avoid that (especially that every other Rust library just chucks stuff on the global rayon pool).
Async can in theory deadlock in a similar way too, but it’s much less likely, since the pool of tasks can be much larger, and locks can be pre-empted or timed out.
Cancellation with threads is hard and annoying. You need to weave some atomic boolean through every function call, and keep checking it. Some operations will block without ability to cancel immediately. Timeouts need to happen at the leaf functions. When the server is under high load and clients start to give up, you don’t want to make load worse by having aborted/no-longer-relevant jobs continue to run.
Cancellation in Rust’s async is automatic (too quick even! — no async destructors). Timeouts can be imposed top-down on any Future.
note: lib.rs is probably done and well; this is just theorizing. Feel free to ignore.
A compromise here could be 1) single-threaded, green thread runtime for IO and 2) multi-threaded, work-stealing runtime for compute. The green-thread aspect allows IO to keep the synchronous interfaces while avoiding compute threads “overloading/deprioritizing” network.
The next trick is for locks/channels to seamlessly integrate with the runtime: All synchronization primitives can be implemented using an Event. The event can detect if running inside green-thread using thread-locals and if not, block the OS thread normally. If so however, it can run other green threads / poll for IO to block instead. Lets you use normal sync-primitives without the risk of deadlock.
Can’t think of a nice way to do cancellation like async without invalidating the assumption of “drop is called at end of scope”. Know of a way, but it would work better in Go.
You’re describing the process that lead to invention of async/await :)
The problem is when code running in the compute thread discovers it needs to fetch something. Especially if it’s in a leaf function deeply buried in the call stack.
detect if running inside green-thread using thread-locals and if not, block the OS thread normally.
But you can’t block compute threads! This is causing the exact problem of threadpool threads being wasted on idly waiting for I/O. It doesn’t matter how you block, whether that’s a blocking channel recv, blocking on condvar for an event, or I/O syscall directly. The thread has to keep running code.
If so however, it can run other green threads
Running green threads from compute thread is pointless (there’s already executor doing it), and it’s only more of the same problem: it is blocking on I/O.
You need to keep running compute jobs on compute threads. If you try to steal more compute work while waiting for the I/O result, you will run into a problem: the job you steal may do blocking operation too, and the next one too. You will be increasing stack usage with every started and paused job, until the stack overflows. Plus you can’t resume these on-stack jobs in an arbitrary order.
So to reliably run arbitrary other jobs on the compute thread, you need to unwind the stack first. This means returning from every function in the current call stack. But that is tricky, because you will have to resume the work after I/O completes. So before returning, you need to save the state of the computation, and later resume computation based on the state. In complex functions with multiple I/O calls this will be a state machine.
This saving and restoring of state is tedious to do by hand. And this is how async/await was born.
The compute threads are for compute and should be relatively simple to isolate them (stuff like large scale parsing, sorting, and checksumming are synchronous and can be done in a spawn_blocking manner).
I should’ve clarified that the detection in the Event happens for any type of runtime (both the green thread one, and the blocking pool one) so that locks/channels work without risk on either. But if you have locks/channels being called on the blocking pool, that should point to unclear separation (compute hitting locks doesn’t scale).
A compute thread’s Event would run other compute tasks instead of green thread tasks (the green thread runtime is serial, so no work-stealing).
The compute threads are for compute and should be relatively simple to isolate them
To isolate them properly, you’d need to know at the top level if any leaf call will ever need I/O, so this is the “functions have colors” issue of async. For example, the messiest case I have now is README rendering ends up in HTML sanitizer that also adds width/height to <img>, and I have an image network request in the middle of a gnarly recursive parser.
A compute thread’s Event would run other compute tasks instead of green thread tasks
To do that you need to do something about stack usage. If you just make calls to jobs that don’t unwind, it will cause a stack overflow (you can have thousands of “recurse then wait” jobs in the queue). This is the reason why golang runtime has stack switching, and why Rust’s async is “stackless”.
From the sound of it seems like you spawned N threads, each basically running main(), with lots of shared data and locks and other stuff to synchronize between each other. Yeah, that’s not going to work great. It sounds like a job for a pipeline/graph of workpools passing work via channels. Stream processing basically.
Cancellation with threads is hard and annoying
With stream processing like setup every pool just really runs for work in work_rx { ... } and shuts down gracefully when all the upstream work producers are gone. Shutdown often is just dropping up-most sender and waiting for downstream most receiver to return None.
But yes IO cancellation is a strong side of async, so I would consider running async executor for network threadpool(s) only if it makes things easier or you want to drive tons of connections. There’s no problem sending stuff via channels between async and blocking parts, so one can mix and match them.
Async has spawn_blocking, semaphores, and concurrent streams to manage workloads more precisely.
The size of a workpool and depth of channels are for that.
If a task happening under a lock is dependent on another thread in the same threadpool, it is likely to deadlock.
That’s just a bad data/processing architecture, nothing to do with blocking or async. Just because async will allow “more threads” which will mask the issue, doesn’t change that fact.
In a mixed workload of I/O and CPU-bound tasks,
That’s why each worker pool should do one step of one nature. CPU pool can be sized up to num_cpus, IO threads often benefit from oversizing.
I wouldn’t dare to use stream processing for more than a few serial steps. In lib.rs the messy problems I have are more like exploration of the data, with chains of fallbacks for filling missing data from alternative sources. The process is very diverging, with varying depth of steps. Because I’m mainly pulling the data, it’s not as easy as channel.send(). I need the data to flow back. And I’m pulling to lazily compute only what is needed.
Cargo dependencies are recursive, with loops. The server has multiple kinds of pages, which fetch different, but overlapping, subsets of data. There’s no clear global order to processing steps that would guarantee no deadlocks.
Multiple independent message-processing loops communicating via channels are just an awful awful syntax for async. When the goal is not to block the thread until the answer arrives, the calling function must end at every “await”, which cuts the code into multiple pieces loosely connected by channels. It’s hard to keep track where the other ends of the channels are used, so it’s hard to even see what happens in what order. The lack of clarity of channels+goroutines is the reason I can’t stand Golang.
With await, those multi-step multi-goroutine processes are just imperative code. Data dependencies between the steps are obvious — the channels become local variables.
Because I’m mainly pulling the data, it’s not as easy as channel.send(). I need the data to flow back
There’s a pattern of sending work to a threadpool with a callback to respond it, and since Senders are cheaply clonable it works well and is efficient. One can easily extract a “network calling threadpool (I tend to call them actor/service)”, send it work as an enum variant, with a place to send back the result.
It’s possible to have one central coordinator thread that just makes decisions about the next step, sends work and collect responses on which it makes new decisions and dispatches new work.
It’s not technically a dag, but same rules apply. The moment workers’s Receivers disconnect, the whole thing gracefully shut down etc.
Multiple independent message-processing loops communicating via channels are just an awful awful syntax for async.
I agree here that with a complex flows passing messages becomes basically implementing state machines, which what async is an abstraction for. I admit that async solution looks more reasonable in this case.
But first - there’s not that many project that work in this way.
And second - I personally think threadpool it is still going to be better solution. With async you’ll basically have to slap semaphores in various places to achieve similar concurrency controls as threadpool thread number would. A logic controller dispatching work to dedicated “specific workload” threadpools does not seem like all that much work, and will be far easier to test and reason about than long and diverging async logic with various calls scattered through it.
You can test the logic by giving it events and checking the expected dispatched messages, etc. I bet even with current async code each “task” is basically a loop (or a handful of loops representing top-level state) that makes a decision, then does some couple of things, possibly conditionally and returns to the top of the loop to make a next decision, because any other approach would break down in complexity at certain point.
Things like saving progress in a database will become much more easier with an “explicit state machine” as well. It’s just generally more structured approach, while async gives you an implicit state machine that is not good for anything but executing or not (cancellation).
And with threadpools you don’t have to deal with async, at least in places where it doesn’t give you anything.
The lack of clarity of channels+goroutines is the reason I can’t stand Golang.
There’s so many more reasons to hate on Golang… :D
It’s possible to make a mess out of channels, but it’s not hard to have a clean and easy to draw and explain communication architecture.
May I ask why I/O and CPU-bound workloads are mixed? I was wondering if you could separate that by, say, having a separate worker pool for heavy tasks and sync through the DB. Scaling the thread pool might get easier this way, and it could help with the locking problems you mentioned?
lib.rs aggregates data from many sources, and applies some heavy processing. Some data is fetched from crates.io API and github, which are rate limited. Some data is from scanning git repos and parsing their content. Some data is queried from a database, some computed from crates git index. Some things like crate similarity and ranking need all of these sources together. I have caching and batch jobs, but they’re in fine balance between freshness and performance.
IME, for server-style applications at least, it’s difficult to avoid async. All the libraries you want to use seem to depend on it. We went way the hell out of our way to segregate the async libraries we just had to bring in, and use plain threads plus channels for everything else, but in the end we found that we were spending a bunch of complexity on just gluing the two worlds together. So we just converted everything over to async, and think back fondly to when we used to have useful stack traces.
Now you’ve got me thinking of some automated way to generate a wrapper for a given async library, which presents a conventional synchronous interface. It is not clear to me if that is possible.
I’m also a threads and channels kind of guy, not that I’ve had to deal with that lately.
Can an attribute macro rename the original function? Supposing you have function foo() as in your example, can the macro rename that to be async_foo() and declare a new pub fn foo() { ... } which will have the synchronous function signature?
To accommodate the various async runtimes, maybe the attribute macro would also accept arguments such as tokio and async_std to indicate which one is in use.
Now you’ve got me thinking of some automated way to generate a wrapper for a given async library, which presents a conventional synchronous interface. It is not clear to me if that is possible.
Details aside, isn’t this what runtimes like Tokio essentially provide?
Details aside, isn’t this what runtimes like Tokio essentially provide?
No. Libraries like tokio provide a framework and async task executor to enable async programming.
I’m talking about the API an async library presents. In those cases, the code using the library needs to be async aware. Take for example, the creation of a new server object in Tide:
It depends what you mean “conventional”; It has a block_on method so regular syncronous code can await a future. You might even argue that’s a convention. But I’d usually read “sync wrapper for async library” as meaning that the sync interface presented is isomorphic to the async library API, albeit minus futures.
Author here. I have a lot of appreciation for smol and often suggest it to seasoned Rust users. That said, it’s crucial to recognize that Tokio is quite dominant in the ecosystem. A significant number of libraries are tailored exclusively for Tokio. While there are compatibility crates available, integrating them can feel like an uphill task.
Additionally, even if you personally sidestep Tokio, there’s a good chance a dependency you use might pull it in, leading to an indirect reliance. In comparison, smol’s ecosystem is not even 1/10th the size, with fewer projects built around it, and the available documentation isn’t as extensive.
My hope is that there’ll be a gradual shift towards favoring leaner runtimes. But such a transformation will demand time and foundational efforts. For the time being, for practical, “real-world” implementations, Tokio remains a safer bet.
I think virtually everyone: a) Would complain if it weren’t multithreaded - hence why everyone enables multithreaded runtimes
b) Is absolutely fine with the tradeoff of typing a dozen or so extra characters on some trait bounds
Of course it is. You don’t have to use Arc, you can just copy your data or move it. You don’t have to use
spawn
, Rust lets you borrow data across.await
points and pass references into async functions. It’s pretty awesome. You can usespawn_local
. You can use a channel. Or, gasp, you can Arc it. Mutex is only needed for shared mutable state - I seriously almost never use it, but idk, it’s not that big of a deal?Yeah, because almost no one actually wants to make the tradeoffs you’re talking about. I get that you care about this, but I don’t and I think most people don’t - most people are fine with the status quo of tokio.
Couple things.
Async is more than performance. Async allows you to trivially write code in a way that the code can be scheduled cooperatively, such as allowing a json-lines parser to yield back control to the caller every N lines. Async allows you to trivially write “this request, given these parameters, should time out after N seconds” without exposing sockets to every function that might make a request. These aren’t small things IMO.
Async is more than latency. OS threads also take up memory. OS threads are amazing, don’t get me wrong - I’ve been screaming “OS threads are amazing for performance, you probably don’t need async for performance” for years, but I think it’s notable that there are other benefits in terms of performance than just latency.
Yeah some of us do need that performance :)
On this I certainly agree. Trying to over async-ify things is a bad idea, almost tautologically. Use it as needed. Don’t be afraid to use a thread (or spawn_blocking) when appropriate.
What makes you think most people would want it to be multithreaded? Node.js has a single-threaded event-loop, and it works just fine. Concurrency does not imply parallelism. What’s great about their single-threaded runtime is that it doesn’t require any locking mechanisms.
It’s not about adding a dozen or so extra characters, either. Those trait bounds are there for a reason. Every trait bound is a limitation on the types you can use in that section. Not all types are
Send + Sync + 'static
, so you end up wrapping them in anArc<Mutex<T>>
and passing them around like that throughout your application. That decision can have a severe impact on the design of the whole system as trait bounds tend to spread across the call-chain.“Just copying your data or moving it” is a simplification here. While you can move data into a task, it must meet the aforementioned trait bounds. Because of these bounds, you often have to resort to using synchronization primitives. Simply “moving” is not an option because of the
Sync
bound. Also, copying data might not always be feasible or efficient, especially for large or non-Copy types.Using
spawn_local
or a localsmol
runtime is great and more people should know about it; but it does limit the libraries or tools you can use in your project. Many async libraries and dependencies are written with the assumption that tasks can beSend
.I don’t think most people know about these tradeoffs in the first place. If they were aware of the long-term implications of that decision on their project and the additional complexity that comes with this ecosystem, they would be more cautious. We should avoid drawing premature conclusions here. People might be fine with Tokio, but only because of a lack of viable alternatives or because they equate async Rust with performance, and so they blindly add it to their project without even thinking about the consequences.
Async also allows you to trivially shoot yourself in the foot by forgetting to poll a future (which causes it to simply not run) or blocking a thread on a long-running task. Used in the cases you mentioned, I fully agree that it has its benefits, as long as async usage is limited in scope to a small part of the application. No need for
#[tokio::main]
for most cases.I actually think we both fundamentally agree in that async is a powerful weapon to have – if used in moderation and for the right problems. As the author of lychee, I know that it has great benefits for performance, but it comes at a significant cost.
The fact that the most popular runtime, by far, is multithreaded, despite there being many alternatives at the time when async was first kicking into gear.
Virtually no one in Rust is coming from Node thinking “Well, Node was fast enough for me”.
No one said otherwise.
No one requires locking unless you’re mutating state across threads. You can just not do that - Rust makes it super easy because mutability is something you can understand at compile time. I’ve almost never needed a mutex.
The vast majority of types can be sent across threads simply by moving them. Satisfying those bounds is rarely a problem for me. You do not need Mutex unless you need to mutate your state across threads. If you need to mutate state, ok, use a Mutex, or just move the data.
Almost all datatypes can be moved while satisfying those tasks. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ce29f4c4ca2499fc0367a63ac64422ec
Note that none of those types involve a Mutex, but I did throw an Arc in there for completeness.
Moving is an extremely easy way to satisfy these bounds for many types - the only case where that’s tricky is if you have references, which makes
'static
hard to satisfy; to that, either clone the value or use a local task.So then Arc them or borrow them.
Are all of the people using tokio just not noticing? Are they ignorant? Or are they just… fine with it.
Tokio is the main runtime now and it was always an early contender since the early days of mio. But there have absolutely been alternatives, certainly they were there when async was starting to really become popular. People chose tokio.
Maybe they were just fools who blindly wanted multithreading, or maybe they just prefer it.
Yep. Although there’s a lint on by default for not polling a future, but you can store the future and forget to, certainly. Blocking can be an issue as well. Async is not perfect. FWIW it’s trivial to block an OS thread too, perhaps even moreso.
I think the only thing we are likely to disagree about is how significant that cost is. I think it’s minor, some people think it’s massive. A fun thing that we can agree on so much and yet draw different conclusions :)
Everything depends on context, of course. But at least in the context of request-processing services – e.g. anything that provides an HTTP server, or consumes events from a queue – both concurrency and parallelism are essentially table-stakes requirements. That is, fully utilizing every core on an N-core machine should be possible with a single process, it shouldn’t require N processes.
nginx (for example) “requires” N processes and fully utilizes every core on an N-core machine, arguably better than a tokio-based solution given it doesn’t have to deal with the inefficiencies of rust-async or the synchronization costs of work-stealing.
Sure, and the same is true of many other programs, like apache, or Unicorn (Ruby), etc. etc. But this model of parallelism is anachronistic, especially for application services, is, I guess, my point.
The answer to complexity of async is typically “oh, just use threads, they’re nice and simple!”
https://lib.rs started as a synchronous batch process, with sync I/O, and lots of
rayon
, and this setup did not work well for a server:In a mixed workload of I/O and CPU-bound tasks, it’s hard to have full utilization of the CPU, and not overload network at the same time. Waiting threads “waste” the processing power of the threadpool. Making threadpools oversized is a mixed bag, because it can still have too many I/O-waiting jobs, and overload the network too.
Async has
spawn_blocking
, semaphores, and concurrent streams to manage workloads more precisely. Sync equivalents of these things don’t unblock their threads, unless you reinvent some callback-based async substitute.Use of locks in a threadpool is very risky (this includes blocking channels). If a task happening under a lock is dependent on another thread in the same threadpool, it is likely to deadlock. In a large project it becomes a challenge to keep separate threadpools with a strict hierarchy to avoid that (especially that every other Rust library just chucks stuff on the global
rayon
pool).Async can in theory deadlock in a similar way too, but it’s much less likely, since the pool of tasks can be much larger, and locks can be pre-empted or timed out.
Cancellation with threads is hard and annoying. You need to weave some atomic boolean through every function call, and keep checking it. Some operations will block without ability to cancel immediately. Timeouts need to happen at the leaf functions. When the server is under high load and clients start to give up, you don’t want to make load worse by having aborted/no-longer-relevant jobs continue to run.
Cancellation in Rust’s
async
is automatic (too quick even! — no async destructors). Timeouts can be imposed top-down on any Future.note: lib.rs is probably done and well; this is just theorizing. Feel free to ignore.
A compromise here could be 1) single-threaded, green thread runtime for IO and 2) multi-threaded, work-stealing runtime for compute. The green-thread aspect allows IO to keep the synchronous interfaces while avoiding compute threads “overloading/deprioritizing” network.
The next trick is for locks/channels to seamlessly integrate with the runtime: All synchronization primitives can be implemented using an Event. The event can detect if running inside green-thread using thread-locals and if not, block the OS thread normally. If so however, it can run other green threads / poll for IO to block instead. Lets you use normal sync-primitives without the risk of deadlock.
Can’t think of a nice way to do cancellation like async without invalidating the assumption of “drop is called at end of scope”. Know of a way, but it would work better in Go.
You’re describing the process that lead to invention of async/await :)
The problem is when code running in the compute thread discovers it needs to fetch something. Especially if it’s in a leaf function deeply buried in the call stack.
But you can’t block compute threads! This is causing the exact problem of threadpool threads being wasted on idly waiting for I/O. It doesn’t matter how you block, whether that’s a blocking channel recv, blocking on condvar for an event, or I/O syscall directly. The thread has to keep running code.
Running green threads from compute thread is pointless (there’s already executor doing it), and it’s only more of the same problem: it is blocking on I/O.
You need to keep running compute jobs on compute threads. If you try to steal more compute work while waiting for the I/O result, you will run into a problem: the job you steal may do blocking operation too, and the next one too. You will be increasing stack usage with every started and paused job, until the stack overflows. Plus you can’t resume these on-stack jobs in an arbitrary order.
So to reliably run arbitrary other jobs on the compute thread, you need to unwind the stack first. This means returning from every function in the current call stack. But that is tricky, because you will have to resume the work after I/O completes. So before returning, you need to save the state of the computation, and later resume computation based on the state. In complex functions with multiple I/O calls this will be a state machine.
This saving and restoring of state is tedious to do by hand. And this is how async/await was born.
The compute threads are for compute and should be relatively simple to isolate them (stuff like large scale parsing, sorting, and checksumming are synchronous and can be done in a spawn_blocking manner).
I should’ve clarified that the detection in the Event happens for any type of runtime (both the green thread one, and the blocking pool one) so that locks/channels work without risk on either. But if you have locks/channels being called on the blocking pool, that should point to unclear separation (compute hitting locks doesn’t scale).
A compute thread’s Event would run other compute tasks instead of green thread tasks (the green thread runtime is serial, so no work-stealing).
To isolate them properly, you’d need to know at the top level if any leaf call will ever need I/O, so this is the “functions have colors” issue of async. For example, the messiest case I have now is README rendering ends up in HTML sanitizer that also adds
width
/height
to<img>
, and I have an image network request in the middle of a gnarly recursive parser.To do that you need to do something about stack usage. If you just make calls to jobs that don’t unwind, it will cause a stack overflow (you can have thousands of “recurse then wait” jobs in the queue). This is the reason why golang runtime has stack switching, and why Rust’s async is “stackless”.
From the sound of it seems like you spawned N threads, each basically running
main()
, with lots of shared data and locks and other stuff to synchronize between each other. Yeah, that’s not going to work great. It sounds like a job for a pipeline/graph of workpools passing work via channels. Stream processing basically.With stream processing like setup every pool just really runs
for work in work_rx { ... }
and shuts down gracefully when all the upstream work producers are gone. Shutdown often is just dropping up-most sender and waiting for downstream most receiver to returnNone
.But yes IO cancellation is a strong side of async, so I would consider running async executor for network threadpool(s) only if it makes things easier or you want to drive tons of connections. There’s no problem sending stuff via channels between async and blocking parts, so one can mix and match them.
The size of a workpool and depth of channels are for that.
That’s just a bad data/processing architecture, nothing to do with blocking or async. Just because async will allow “more threads” which will mask the issue, doesn’t change that fact.
That’s why each worker pool should do one step of one nature. CPU pool can be sized up to num_cpus, IO threads often benefit from oversizing.
I wouldn’t dare to use stream processing for more than a few serial steps. In lib.rs the messy problems I have are more like exploration of the data, with chains of fallbacks for filling missing data from alternative sources. The process is very diverging, with varying depth of steps. Because I’m mainly pulling the data, it’s not as easy as
channel.send()
. I need the data to flow back. And I’m pulling to lazily compute only what is needed.Cargo dependencies are recursive, with loops. The server has multiple kinds of pages, which fetch different, but overlapping, subsets of data. There’s no clear global order to processing steps that would guarantee no deadlocks.
Multiple independent message-processing loops communicating via channels are just an awful awful syntax for async. When the goal is not to block the thread until the answer arrives, the calling function must end at every “await”, which cuts the code into multiple pieces loosely connected by channels. It’s hard to keep track where the other ends of the channels are used, so it’s hard to even see what happens in what order. The lack of clarity of channels+goroutines is the reason I can’t stand Golang.
With await, those multi-step multi-goroutine processes are just imperative code. Data dependencies between the steps are obvious — the channels become local variables.
There’s a pattern of sending work to a threadpool with a callback to respond it, and since
Sender
s are cheaply clonable it works well and is efficient. One can easily extract a “network calling threadpool (I tend to call them actor/service)”, send it work as an enum variant, with a place to send back the result.It’s possible to have one central coordinator thread that just makes decisions about the next step, sends work and collect responses on which it makes new decisions and dispatches new work.
It’s not technically a dag, but same rules apply. The moment workers’s
Receiver
s disconnect, the whole thing gracefully shut down etc.I agree here that with a complex flows passing messages becomes basically implementing state machines, which what async is an abstraction for. I admit that async solution looks more reasonable in this case.
But first - there’s not that many project that work in this way.
And second - I personally think threadpool it is still going to be better solution. With async you’ll basically have to slap semaphores in various places to achieve similar concurrency controls as threadpool thread number would. A logic controller dispatching work to dedicated “specific workload” threadpools does not seem like all that much work, and will be far easier to test and reason about than long and diverging async logic with various calls scattered through it.
You can test the logic by giving it events and checking the expected dispatched messages, etc. I bet even with current async code each “task” is basically a loop (or a handful of loops representing top-level state) that makes a decision, then does some couple of things, possibly conditionally and returns to the top of the loop to make a next decision, because any other approach would break down in complexity at certain point.
Things like saving progress in a database will become much more easier with an “explicit state machine” as well. It’s just generally more structured approach, while async gives you an implicit state machine that is not good for anything but executing or not (cancellation).
And with threadpools you don’t have to deal with async, at least in places where it doesn’t give you anything.
There’s so many more reasons to hate on Golang… :D
It’s possible to make a mess out of channels, but it’s not hard to have a clean and easy to draw and explain communication architecture.
May I ask why I/O and CPU-bound workloads are mixed? I was wondering if you could separate that by, say, having a separate worker pool for heavy tasks and sync through the DB. Scaling the thread pool might get easier this way, and it could help with the locking problems you mentioned?
lib.rs aggregates data from many sources, and applies some heavy processing. Some data is fetched from crates.io API and github, which are rate limited. Some data is from scanning git repos and parsing their content. Some data is queried from a database, some computed from crates git index. Some things like crate similarity and ranking need all of these sources together. I have caching and batch jobs, but they’re in fine balance between freshness and performance.
IME, for server-style applications at least, it’s difficult to avoid async. All the libraries you want to use seem to depend on it. We went way the hell out of our way to segregate the async libraries we just had to bring in, and use plain threads plus channels for everything else, but in the end we found that we were spending a bunch of complexity on just gluing the two worlds together. So we just converted everything over to async, and think back fondly to when we used to have useful stack traces.
Now you’ve got me thinking of some automated way to generate a wrapper for a given async library, which presents a conventional synchronous interface. It is not clear to me if that is possible.
I’m also a threads and channels kind of guy, not that I’ve had to deal with that lately.
Interesting thought! There are things like pollster, which could help you with that. Maybe even an attribute macro could work:
I don’t know of any library that does that yet. If there is a reason for not trying that, I’m not aware of it right now.
Can an attribute macro rename the original function? Supposing you have function
foo()
as in your example, can the macro rename that to beasync_foo()
and declare a newpub fn foo() { ... }
which will have the synchronous function signature?To accommodate the various async runtimes, maybe the attribute macro would also accept arguments such as
tokio
andasync_std
to indicate which one is in use.Details aside, isn’t this what runtimes like Tokio essentially provide?
No. Libraries like
tokio
provide a framework and async task executor to enable async programming.I’m talking about the API an async library presents. In those cases, the code using the library needs to be async aware. Take for example, the creation of a new server object in Tide:
https://docs.rs/tide/latest/tide/struct.Server.html
The server’s support for the HTTP get operation is an async function (a very simple one in this case).
The more I research, the more it seems clear to me that automatically generating wrappers around async libraries will be very difficult at best.
Tokio doesn’t provide a conventional synchronous interface over async code?
It depends what you mean “conventional”; It has a
block_on
method so regular syncronous code can await a future. You might even argue that’s a convention. But I’d usually read “sync wrapper for async library” as meaning that the sync interface presented is isomorphic to the async library API, albeit minus futures.TIL tokio is multithreaded by default. That seems like a poor choice. I don’t think the executor we use at work is multithreaded by default…
Not by default, but it gets enabled by the features “full” and “rt-multi-thread”. By default most of tokio is not enabled.
Most people, including me, expect it to be MT. You can change that in the features.
I’m curious about the “start with Tokio” advice at the end. I’m wondering why not smol?
My personal context in the space: I’ve written a bunch of Rust and a bunch of threaded ruby code but zero async Rust to date.
Author here. I have a lot of appreciation for smol and often suggest it to seasoned Rust users. That said, it’s crucial to recognize that Tokio is quite dominant in the ecosystem. A significant number of libraries are tailored exclusively for Tokio. While there are compatibility crates available, integrating them can feel like an uphill task.
Additionally, even if you personally sidestep Tokio, there’s a good chance a dependency you use might pull it in, leading to an indirect reliance. In comparison, smol’s ecosystem is not even 1/10th the size, with fewer projects built around it, and the available documentation isn’t as extensive.
My hope is that there’ll be a gradual shift towards favoring leaner runtimes. But such a transformation will demand time and foundational efforts. For the time being, for practical, “real-world” implementations, Tokio remains a safer bet.
[Comment removed by author]