It’s interesting how much CHERI influenced designs of post-2017 Arm extensions. We were talking to Arm when they were designing SVE and the over-read problem came up then. With most architectures, you can safely over read as long as you don’t cross a page boundary, so doing it for things within an aligned vector is fine. With a CHERI system, you have byte-granularity memory safety and so the over read can fault. I didn’t realise that they’d added special behaviour to SVE to permit this to work safely.
I wish he’d done an actual benchmark instead of just counting instructions and calling it a day; although I recognize this is just a quick blog post and not an academic paper :)
I would assume/hope my C library’s strlen is already optimized up the wazoo with CPU-specific SIMD, and likewise related functions like strchr, memchr, etc. But IIRC there have been earlier threads here explaining why that isn’t always the case.
It’s interesting how much CHERI influenced designs of post-2017 Arm extensions. We were talking to Arm when they were designing SVE and the over-read problem came up then. With most architectures, you can safely over read as long as you don’t cross a page boundary, so doing it for things within an aligned vector is fine. With a CHERI system, you have byte-granularity memory safety and so the over read can fault. I didn’t realise that they’d added special behaviour to SVE to permit this to work safely.
I wish he’d done an actual benchmark instead of just counting instructions and calling it a day; although I recognize this is just a quick blog post and not an academic paper :)
I would assume/hope my C library’s strlen is already optimized up the wazoo with CPU-specific SIMD, and likewise related functions like strchr, memchr, etc. But IIRC there have been earlier threads here explaining why that isn’t always the case.