1. 13
  1.  

  2. 4

    POSIX mandates that every operation that creates a file descriptor returns a file descriptor at the lowest unused number. This is because, on early UNIX systems, dup2 didn’t exist and there needed to be some mechanism for shells to redirect standard in/out/error. These were defined as file descriptors 0, 1, and 2 and so the shell could close them all, open new file descriptors in the right order, and know that everything was in the right place.

    This was mostly fine on single-threaded *NIX systems but it became a bottleneck as multithreading became more common. Now you have a global file descriptor table and every system call that does any I/O has to access this in a mechanism that is safe with concurrent mutation (locks, RCU, whatever). Every operation that creates a new file descriptor must atomically (with respect to any similar operation) find and reserve the lowest file descriptor number. For short-lived file descriptors, this can easily become a bottleneck.

    The file descriptor table is also shared global state, which makes it slightly problematic with respect to concurrency: if I write to fd n, another thread can close that fd and, because of the above rule, any other fd that is created is likely to be fd n, so I will end up writing to the wrong thing (in contrast, on Windows HANDLEs are arbitrary pointer-sized quantities and so if you close one and write to it you’re far more likely to get an error than to write to the wrong thing).

    This new mechanism effectively provides a file descriptor table in a separate object. Operations on this are uncontended (io_uring starts operations sequentially and so the same kernel lock protecting the ring can protect the per-ring descriptor table). This should make operations that require a lot of short-lived file descriptors (or a lot of file descriptors opened on a load of different threads) much more efficient.