My colleagues have been refactoring BIND9 along these lines in recent years. Originally BIND had its own multithreaded event-based IO multiplexer (type 2 in this article’s terms). It was replaced by libuv, tho still type 2 because so much of the code was not designed to stay put on one CPU. The next release, BIND 9.20, should be mostly type 3, a libuv event loop per core, and no need for locking around socket operations. Gradually turning the old workhorse into something a bit more modern…
Work distribution is always a challenging problem when designing type 3 systems. Is it possible to go into what BIND 9.2’s policy is there? Similar systems tend to rely on the kernel’s accept() protocol or use round-robin to distribute tasks across each core, which doesn’t seem great for CPU utilization if there’s large/varying sized units of work to perform.
At the moment we are just relying on the kernel: there are enough performance bottlenecks elsewhere that I think load distribution is not our biggest problem right now. But the loop-per-cpu refactor is still very new, so there is still a lot left for us to find out!
Why is this tagged as C? I think it should be tagged with zig.
pzero
is primarily a C project with experiments done in Zig on the side.My colleagues have been refactoring BIND9 along these lines in recent years. Originally BIND had its own multithreaded event-based IO multiplexer (type 2 in this article’s terms). It was replaced by libuv, tho still type 2 because so much of the code was not designed to stay put on one CPU. The next release, BIND 9.20, should be mostly type 3, a libuv event loop per core, and no need for locking around socket operations. Gradually turning the old workhorse into something a bit more modern…
Work distribution is always a challenging problem when designing type 3 systems. Is it possible to go into what BIND 9.2’s policy is there? Similar systems tend to rely on the kernel’s
accept()
protocol or use round-robin to distribute tasks across each core, which doesn’t seem great for CPU utilization if there’s large/varying sized units of work to perform.At the moment we are just relying on the kernel: there are enough performance bottlenecks elsewhere that I think load distribution is not our biggest problem right now. But the loop-per-cpu refactor is still very new, so there is still a lot left for us to find out!
Suggest drop the
person
tag, since I don’t think that’s a primary component of this story.