1. 24

An incredibly detailed account of the WebKit’s team’s adaptive locking project (a lock that uses a global concurrent hashtable for parking and unparking threads). The end result is a two-bit lock and across-the-board performance increases.

  1. 10

    The section on barging vs convoying is particularly interesting to me. OpenBSD (user land) spin locks are unfair, which I decided was bad. I replaced them with ticket locks, which enforce fifo behavior. All is better, right? Micro tests certainly showed increased fairness, but larger applications (like, uh, a certain browser) became much slower. Not the desired result. Stupid communists and their fairness, wrong about everything!

    1. 2

      If you’re optimizing for performance, there’s no one-size-fits-all lock implementation to use. That’s why libraries such as Intel’s TBB provide both queuing mutexes and spin locks - as well as reader-writer variations of both. As mentioned in the OP, spinlocks are appropriate when contention is light and critical sections are short. Queueing mutexes work best when contention is moderate to heavy and critical sections are longer.

      On TSX-enabled hardware, it’s also worth trying TBB’s speculative_mutex which attempts to use the XBEGIN and XEND instructions to execute the critical section, falling back to spinning if too many aborts happen. If a lock is mostly uncontended, XBEGIN/XABORT is about 20% faster than the equivalent spinlock acquisition in my experience.

    2. 1

      Looking at the parking lot api, it looks like an emulation of what the kernel does in futexes. I’m curious if/why this is more efficient than simply using futexes directly for waiting (or similar APIs on other systems).

      1. 1

        The futex may be more efficient, but not available on many target platforms. An optional futex backend could be a good refinement after they get the generic version working.

        1. 1

          AFAIK futexes are not implemented in OS X. Shame, as they seem like a really valuable API to have.

          1. 2

            Right. They have psynch_mutex*, though, which I’m currently trying to reverse engineer for a personal thing.

            I’m still trying to sort out the semantics, but it… maybe… looks like it could do similar things. (And if there’s anyone reading who is familiar with how exactly it works, I would buy you many beers for some explanation.)