Pretty common bug. I’ve fixed exactly the bug where the code tries to allocate 4096 bytes and allocates 4096 + 8 bytes (8192 bytes).
I wonder if a simpler fix across the board would have been to modify the category sizes to be something like 2^n+32. That was my instinct when reading this, and I wonder if it was dismissed for a reason I’m not seeing.
Keeping allocations to powers of two means you can ensure that an allocation crosses as few page/cache lines as possible, which helps performance.