They’re likely using modern x86 CPUs. They do have instructions to accelerate crc32/crc32c. I have no idea why they’re not just using those.
It is also very complicated-looking high level code that smells of premature optimization, potentially doing way more damage than good on a modern compiler. An asm implementation would be like 30 lines.
I wonder what a clean way to do this would be. Maybe some assembly implementations in their own source files, for different CPUs, with some runtime detection, with a C or C++ implementation as fallback.
They’re likely using modern x86 CPUs. They do have instructions to accelerate crc32/crc32c. I have no idea why they’re not just using those.
It is also very complicated-looking high level code that smells of premature optimization, potentially doing way more damage than good on a modern compiler. An asm implementation would be like 30 lines.
It appears that they are using those instructions. If you have at least SSE 4.2 support then they use
_mm_crc32_u64
at https://github.com/facebook/rocksdb/blob/main/util/crc32c.cc#L365. And if you also have additionally havepclmulqdq
support then they compute 3 CRCs in parallel and combine them: https://github.com/facebook/rocksdb/blob/main/util/crc32c.cc#L686.There’s also support for extensions on non-x86 CPUs (ARM and PowerPC) as well as a portable fallback path for when none of the above are available.
Oh, they are indeed. It really looks obtuse.
I wonder what a clean way to do this would be. Maybe some assembly implementations in their own source files, for different CPUs, with some runtime detection, with a C or C++ implementation as fallback.