1. 9
  1. 3
    1. 1

      Why would “fully asynchronous I/O” be a good idea?

      (Assuming the usual meaning of async = “programming w/out control flow”.)

      1. 6

        In general, it’s easy to implement synchronous API on top of asynchronous API, but not vice versa. Managarm implements POSIX synchronous API on top of its asynchronous API, for example.

        1. 1

          It is impossible to implement synchronous API on top of asynchronous API in the most widely used programming language, JavaScript.

          If you have threads then yes, it might be possible, but why not use threads to begin with?

          1. 4

            The difference is that asynchronous I/O in Javascript works only via callback. For an OS kernel it is trivial to provide a single synchronous completion-wait syscall and thus all asynchronous I/O can be made synchronous by turning it into two steps: schedule asynchronous I/O, then wait for that I/O to complete. This doesn’t require the application to be multi-threaded.

            1. 2

              It is impossible to implement synchronous API on top of asynchronous API in the most widely used programming language, JavaScript.

              I’m not sure I entirely understand what you mean. If you want to block on a fetch in JavaScript, you can simply await it. That makes it synchronous, does it not?

              There’s of course an event loop / scheduler that decides when to schedule your function’s executive, but the same is true of processes/threads on Linux.

              1. 1

                await is only possible within special contexts (at the top-level or within async functions). Now say for example you want to use an API that requires a non-async function as parameter. Can’t use await in there.

                1. 1

                  But isn’t that like saying “Now say for example you want to use an API that doesn’t do any context switches. You can’t make blocking IO calls in there.”?

                  1. 0

                    I am just saying that you can’t - in general - program async as if it was sync. Not in JS.

                    You can do it in a language with threads (because a thread can be blocked anywhere, whereas async/await can only block in particular contexts).

                    P.S. I don’t think my example is frivolous. Let’s say the API in question does some sophisticated compute work and you can’t replace or modify it easily. But your requirements also force you to make an async IO call from the callback. Well, you can’t with async/await.

                    P.P.S. Context-switching behavior is usually not under the control of app programmers so I don’t really get your comparison.

                    1. 1

                      I’m just thinking out loud, essentially. I’m still on the fence about the whole function colors debate.

                      I think it’s interesting, though, that while the syntax of async/await is different, the semantics is essentially the same as traditional processes/threads and context switching. Until you introduce parallel execution primitives such as Promise.all, at which point async/await becomes strictly more expressive.

                      From this perspective, it seems like async IO is indeed a better foundation on which to build an OS.

              2. 1

                how are threads implemented? microkernels are just on top of hardware, I don’t know anything about this but from reading a bit on the hurd website the issue is that the synchronous microkernels block a lot whereas the async ones can get more done >.> idk

            2. 6

              You seem to be thinking in terms of language-level abstractions, not OS abstractions. Your definition is definitely not ‘the usual meaning of async’ in the context of systems programming. When you do synchronous I/O in an OS, the following sequence happens:

              1. The OS deschedules the calling thread.
              2. The OS notifies the relevant subsystem (e.g. storage, network) to begin processing the I/O.
              3. The relevant subsystem may return immediately if it has some cached value (e.g. disk I/O in the buffer cache, incoming network packets) but typically it issues some DMA commands to tell the hardware to asynchronously deliver the result.
              4. The scheduler runs some other threads.
              5. The I/O completes.
              6. The kernel wakes up the calling thread.

              The flow with asynchronous I/O is very similar:

              1. The OS allows the calling thread to remain scheduled after processing the request.
              2. The OS notifies the relevant subsystem (e.g. storage, network) to begin processing the I/O.
              3. The relevant subsystem may return immediately if it has some cached value (e.g. disk I/O in the buffer cache, incoming network packets) but typically it issues some DMA commands to tell the hardware to asynchronously deliver the result.
              4. The scheduler runs some other threads, including the calling thread.
              5. The I/O completes.
              6. The kernel either asynchronously notifies the calling thread (e.g. via a signal or writing an I/O-completed bit into a userspace data structure) or waits for an explicit (blocking or non-blocking) call to query completion state.

              Given the latter and a blocking wait-for-completion call, you can trivially simulate the former by implementing a synchronous I/O call as an asynchronous request followed by a blocking wait-for-completion. The converse is not true and requires userspace to maintain a pool of threads that exist solely for the purpose of blocking on I/O and waiting for completion.

              If your program wants to take advantage of the asynchronous nature of I/O then it can perform other work while waiting for the I/O.

              Most OS interfaces are synchronous for two reasons:

              • They were designed before DMA was mainstream.
              • They originated on single-core systems.

              On DOS or early ‘80s UNIX, for example, if you wanted to read a file then you’d do a read system call. The kernel would synchronously call through the FS stack to find the right block to read, then would write the block request to the device’s I/O control registers and then sit doing a spinning read of the control registers to read each word that the device returned. There was no point making it async because there was no way of doing anything on the CPU other than polling the device. Even back then, this model didn’t work particularly well for things like networks and keyboards, where you may have no input for a while.

              With vaguely modern (late ‘90s onwards) hardware neither of these is really true. The kernel may synchronously call through the FS stack to get a block, but then it writes a DMA request to the device. The device eventually writes the result directly into memory and notifies the kernel (either via an interrupt or via a control register that the kernel periodically polls). The kernel can schedule other work in the middle. On a multicore system, all of the kernel’s work can happen on a different core to the userspace thread and so all of the FS stack work can happen in parallel with the userspace application’s work.

              There’s one additional dimension, which is the motivation for POSIX APIs such as lio_listio and Linux APIs such as io_uring: system calls can be expensive. In the simple async model outlined above, you potentially double the number of system calls because each call becomes a dispatch + block (or, worse, dispatch + poll multiple times) sequence. You can amortise this if you allow the dispatch to start many I/O operations (you generally don’t want to do this with sync I/O because if you had to, for example, wait until a network packet was received before seeing the result of a disk read then you’d introduce a lot of latency. APIs such as readv and writev do this for the case where it is useful: multiple I/Os to the same descriptor). You can make the poll fast by making the kernel just write a completion flag into userspace memory, rather than keeping state in the kernel that you need to query.

              Don’t conflate this with a language’s async keyword, especially not JavaScript’s. JavaScript has a run loop and event model tied into the language. It handles a single event to completion and then processes the next one. This is already asynchronous because if you had synchronous event polling then you’d block handling of any other event (you can already mess this up quite easily by spending too long servicing one event). The JavaScript async keyword does CPS construction to generate a handler for an event that captures all of the state of the things that happen after an await.