1. 10
  1. 1

    If you’re looking for the answer to the headline question, scroll about halfway down until you see the term in a section heading. Yeesh.

    1. 8

      It’s not a very good explanation. For example, it starts with:

      Before getting into futex, we need to know which problem it solves: race condition

      Futexes do nothing to prevent race conditions. They are a performance optimisation. You could remove every futex call in a program and replace it with a spin loop. It would then get a lot slower because threads would wait until their quantum is exhausted before yielding and so the thread that could make progress would not get CPU time until later.

      You can build a lot of synchronisation primitives on top of futex. Anything that uses a compare-and-swap in its spin path can be converted into a sleeping lock with futex, so it can implement semaphores, mutexes, condition variables, and barriers quite easily. For example, you can use a word to represent the value of a counting semaphore. On the fast path, you read the value, if it’s greater than zero, compare-and-swap to decrement it and a simple atomic increment to increment it. You need to sleep if the value is already 0, so on that path you do a futex call with the wait operation. The kernel will internally acquire a lock, check that the value is still 0, and sleep the calling thread if it is, until there’s a corresponding wake call on the same address. When you increment, if you increment from 0 then there may be sleepers and so you do the futex wake operation to wake one or more of them up. You can optimise this by tracking waiters.

      Similarly, you can implement a barrier by using two words. One that is a counter of threads that haven’t yet arrived at the barrier, the other that is the number of times the barrier has been triggered. When you reach the barrier, you first read the second word and then atomically decrement the first word. If it the first word isn’t zero, then you do a futex-wait on the second word and wake up when it changes. If the first word has reached zero then you are the last thread to reach the barrier, so you increment the second word do a futex wake op on it to wake everyone else up. It doesn’t matter if the second word overflows, as long as the value changes every time all threads rendezvous at the barrier.

      The futex call supports a whole bunch of exciting things, for example bitfield operations allowing you to treat a 32-bit words as 32 separate mutexes and acquire or release any subset in a single atomic op on the fast path and block waiting for up to 32 of them.

      I prefer the FreeBSD version, _umtx_op in general because it has a richer set of data structures to operate on, each optimised for a common concurrency primitive. In particular, it has a much better way of handling timeouts (at least for the _sem2 variant). Both futex and _umtx_op are allowed to spuriously wake and if you need to retry then you need to know how much of your timeout has elapsed. _umtx_op helpfully provides the remaining time as an output so that you can just use that on your next loop iteration.

      1. 1

        Slight nitpick, but for the semaphore example, it would need to wake all the waiters on post observing zero unless it risks deadlocking:

        • 3 threads see 0 and wait
        • a post happens, observed 0, wakes one thread
        • two more post happen and don’t observe 0
        • 1 of the 3 waiting threads wakes up and decrements the value to 2 since it’s now 3
        • the value is now 2 (non zero) but the other 2 threads are still waiting with nothing to wake them up.
        1. 1

          Yes, sorry, I was lax in my language to the point of incomprehensibility. This is what I meant by ‘wake one or more up’ (FUTEX_OP_WAKE with INT_MAX) but reading it again I would have interpreted it as FUTEX_OP_WAKE with 1 or some other arbitrary number. Note that this is not intended to be a good implementation of a semaphore (all of the waiting threads will wake up and then all except one will immediately sleep), it’s just the simplest.