1. 30
  1.  

  2. 4

    I’m pondering the exact formulation of the rules given here. In E and its relatives, I/O is never allowed to block, and so “sync” and “async” are misleading descriptors of behavior. In particular, there’s no block_on() semantics, which happens to correspond nicely to the observation in the article that its behavior is neither quite “sync” nor “async”.

    In E, functions are special cases of objects. Objects need not be present in the same heap at the same time, but can be eventual promises for objects which are not yet constructed. Promises are merely transparent forwarders with a basic state machine to manage their referents. We have two ways of invoking objects: Either we can “call” them by delivering a message to them immediately and blocking until answered, or “send” to them by enqueuing a message and building a promise for the answer. If we translate the given rules into forms more amenable to E, we get that:

    1. Every object is on a computer at some time.
    2. The ways of invoking an object depend on where the object is located relative to the invoker.
    3. Sends work for all objects, but calls only work for objects on the same computer at the same time.
    4. Sends are more painful than calls.
    5. Some core I/O functionality returns promises even when called.

    (4) is falsifiable by syntactic example: An E call looks like function.method(value) and an E send looks like function<-method(value). This is our escape from the otherwise-solid reasoning that the article contains: E doesn’t suspend its I/O monad above its synchronous core behaviors, but requires sends in order to even talk about I/O.

    At the same time, though, we do have a taxonomy of object references, so that instead of saying “blue” or “red”, or “sync” or “async”, we say “near” or “far”: Either objects are here and now, or they’re elsewhere or elsewhen.

    1. 4

      Reading this and some other article before, I’m starting to think, that what itched me badly when I tried to learn some Rust, i.e. the fact that I had trouble hiding some complexity behind APIs, might be a conscious and intended decision by Rust authors/philosophy. Where in Go the ideal as I see it is to focus on a slim and clear API, possibly sacrificing performance a bit if the impact is relatively negligible (case in point: the garbage collector) - Rust would appear to me to go fully meat & bones gore way, much too often for my taste; but I’m starting to realize, this may be exactly the consequence of prioritizing performance and full control as the absolute king - maybe only considered kept in check by security (but that would also explain the nervous tension in the community around the use of “unsafe”). As such, I’m struck with a thought that in Rust, I should probably embrace the gnarliness, and not worry too much about simplicity; I mean it would be still nice to have, and if successfully achieved is probably a demonstration of the ultimate zen-grade mastery of the language, but in day-to-day coding I should probably treat it as a cute accident: give it a short, surprised smile if it shows up, but don’t let its charm succumb me away from the oil and grease of the plumbing, finding another kind of pleasure in the intellectual appreciation that It Will Work Safely And Fast (obviously modulo my ability to chose good algorithms etc.).

      1. 5

        Yes, indeed. In why not Rust there’s a nice example: in a GC language you would have

        struct Foo { bar: Bar }
        

        and the language would insert all the magic needed to make it happen. But it can be done in multiple ways, with different implications for performance, thread safety, and memory management, so Rust makes you chose:

        struct Foo { bar: Bar }
        struct Foo { bar: &Bar }
        struct Foo { bar: &mut Bar }
        struct Foo { bar: Box<Bar> }
        struct Foo { bar: Rc<Bar> }
        struct Foo { bar: Arc<Bar> }
        
      2. 1

        This one doesn’t correspond to a rule from the original article because JavaScript doesn’t support blocking sync functions.

        Well, it does, they’re just mostly pointless. For example, XMLHttpRequest has a synchronous mode. And the node.js synchronous filesystem API probably simplifies writing basic single threaded tools, but I don’t really write JavaScript so I don’t know how much that gets used in practice.

        Verdict: rule #3 applies because block_on() changes a blue function into something that is neither red nor callable from red.

        I had never thought about this, and it’s a great point.

        Consider what happens if one async function behind spawn_blocking(|| block_on(…)) needs data from another async function started the same way in order to proceed. It is possible that the other async function cannot make progress because it is waiting for a slot in the thread pool to even begin executing.

        This can happen with any dependency between two threads in the same pool, or even threads in different size-limited pools, not just when async functions get involved. Thread pools should always be unlimited and concurrency limits should be imposed at a higher level. Max thread pool sizes are a gigantic footgun, and you likely won’t notice any problems until the worst possible time: when you’re under heavy load. Especially since tests probably don’t fully saturate your thread pool.

        1. 1

          Thread pools should always be unlimited and concurrency limits should be imposed at a higher level.

          I want to believe

          Max thread pool sizes are a gigantic footgun, and you likely won’t notice any problems until the worst possible time: when you’re under heavy load.

          I have seen this many times. Do you mind elaborating on what “on a higher level” looks like? Everything I’m imagining is just moving the footgun to a different resource, and I’m not sure exhausting those is any better.

          1. 1

            For example, in a web app check the number of currently executing requests and early return / queue if there are too many. This is equally important in async apps, which are usually willing to spawn infinite workers.

            Ruby and Python apps do this somewhat naturally. They do have worker pool size limits, but there’s typically no way for an app to schedule more work on the request handler worker pool except for making a request to itself. Likewise few async job queue libraries built for Ruby or Python web apps have “block on job” features at all, preventing the issue. Even if they do, the only point of using these libraries is to avoid doing blocking work in an interactive request.

            For those cases, a thread pool size limit isn’t too bad, because access to the thread pools is limited. I’m more talking about thread pools you are using directly, when you have direct access to the executor interface. Your threads might schedule other work, block on it, and then lock up the whole executor.

            Going back to Ruby/Python web apps, if you had a thread/subprocesses pool for doing work exclusively within the request flow, you could and should make that pool infinite. The higher level concurrency control is the app worker pool.

            So I guess I should rephrase: thread pools should be infinite, or have a single well controlled point of entry that is known not to schedule threads with dependencies on each other. The examples in the post all involve multiple arbitrary points of entry to the same thread pool. Such pools should be unlimited, or they will inevitably develop lockup bugs at some point down the line.