This article misses a few things:
UINT32_MAX % N == 0
& (N - 1)
This trick can be a nice win when writing lock free queues, but for normal queues it’s so marginal it’s probably not worth it.
An actual modulo is just too expensive (unless N is a constant, and the compiler has an optimization for that). For the other two designs you don’t need to use mod to support arbitrary values of N, a single conditional will suffice to wrap the values. That won’t work when the indices have an unlimited range.
(Also, surely only powers of two are going to be factors of another power of two?)
Yeah you’re right, it’s funny I never noticed that.
Turns out my entire post is wrong. In lock free code, you actually want the opposite of this trick - keeping reader/writer pos separate. If you have head + length you need to update both variables (e.g. enqueue is rougly head++ length–) which requires a CAS loop that’s twice as wide.
If you’re doing an SPSC ring buffer you can keep the reader/writer pos as non-atomic ints in separate cache lines, and add an atomic flag to each node saying whether it was last read or written. You don’t waste any slots and you only get contention when the queue is nearly empty/full.
It’s amazing how often these things seem to come up:
Ring buffers seem to come up particularly often in embedded code. They’re fairly memory-efficient, have predictable time and space costs (which can be critical!), and can be sized individually based on capacity planning.
There are not a lot of fancy data structures I’d trust to run in an interrupt handler, for example, but a ring buffer seems like an obvious choice for handling an incoming serial data interrupt.
Also, I’ve used the trick described in the post (letting the counters keep increasing, and masking on index) several places, and seen several others use it. I don’t think it’s that uncommon. It’s certainly worth knowing. When the ring buffer empties or one wraps the other, one can reduce both indices without changing the delta, to avoid the possibility of overflow. In many cases, it’s not a realistic concern, though. (i.e., enough operations to cause overflow would also significantly exceed the wear life of the flash memory the code is managing.)
Friends don’t let friends use bitmasks when they could have just used an extra element in the (arguably) simpler earlier versions. Depending on the size of each element I’d prefer simpler code. Plus the no mask versions are basically a design pattern that many programmers are familiar with so it should be somewhat self documenting.
But then how are your peers suppose to know how smart you are?
[Comment from banned user removed]
Please, let’s not turn Lobste.rs into Reddit with threads of tired old witticisms, eh? It adds no value to the discussion.
I like that virtual memory trick that lets you remove the wrap around logic altogether. Too bad it’s not portable.