1. 18
  1.  

    1. 4

      Their implementation gets to futexes at the end.

      Linux’s futexes are quite annoying in a few ways but most especially that they’re 32 bits on all platforms. This means that you can’t use them for pointers, which precludes a number of interesting synchronisation data structures. FreeBSD’s _umtx_op verb that is closest to a futex operates on either 32- or 64-bit values, Windows’ WaitOnAddress operates on 8-, 16-, 32-, or 64-bit values.

      There are a few other nice things in _umtx_op. In particular, it exposes some richer data structures. The goal of all of these systems is to provide a fast path in userspace that doesn’t require a system call and a slow path for either sleeping or waking one or more threads.
      If you hit the slow path, the kernel is going to do some operations on the userspace word with a lock held. This means that it can read and write multiple words, as long as it does so with respect to the userspace contract.

      More importantly, _umtx_op has a sane way of reporting timeouts. For a sleeping operation, you almost always want to use the monotonic clock, or you can end up sleeping for hours instead of milliseconds because the clock changed. futex gets this right. But when you wake from a spurious wake (e.g. as the result of receiving a signal), how long do you have to sleep for? With futex, you need to query the time again. In contrast, _umtx_op‘s semaphore operation allows the kernel to report back the remaining time so that you can now sleep for that long on retry. Annoyingly, most other _mtx_op operations don’t support this.

      1. 1

        Out of curiosity, could you happen to link to some interesting data structures that use 64-bit values?

        1. 1

          On macos, you get 32- or 64-bit futexes.

          You can do pointers in only 32 bits if you establish context for them, which is very doable. But I do agree more bits is better, mainly due to the ability to perform more orthogonal wait-free operations. In the j interpreter, for instance, the upper 16 bits of each futex are reserved by the system, in order to get reliable interruption, so you only get 16 bits for the actual futex value, which is a bit more austere.

          But when you wake from a spurious wake (e.g. as the result of receiving a signal), how long do you have to sleep for? With futex, you need to query the time again. In contrast, _umtx_op‘s semaphore operation allows the kernel to report back the remaining time so that you can now sleep for that long on retry

          On linux, getting the time doesn’t require a context switch due to vdso. So I don’t think this is a very big deal. (Vdso on freebsd would be cool—I think I saw a patch for it floating around, but it was never merged.)

          E: and, of course, even 64 bits isn’t enough for cheri :)

          1. 1

            FreeBSD has a VDSO now (though the page is shared across processes, Linux moved to a different page per process, which lets you use it for things like getpid). It’s had a shared page for getting the time source for a while, but getting the time like this is annoying because you pass a relative time as a timeout so you don’t actually need to query the clock in userspace ever with the FreeBSD API: You specify the timeout and, if you spuriously wait, give the kernel back the time that it provided.