1. 26
    1. 9


      I work pretty heavily with Hyperscan, a really amazing regular expression matching library, and it’s very heavily tied to x86_64 instruction sets. There have been a few attempts to port it to other architectures (I’m aware of at least an ARM port). Something like this was used for that port, I think.

      (Aside from its just incredible speed, Hyperscan has the most ergonomic API for the kinds of things I need it to do: matching multiple expressions at once, and reporting matches as they happen via callback. Other libraries do one or the other but not both and often, the matching-multiple-expressions-at-once thing is reduced to “see if any match” which is distinct from “report matches for all these”. Another nice ergonomic feature is that the library lets the user assign identifiers to the different expressions that are reported back in the callback, rather than have them be library-assigned or purely positional.)

      1. 2

        Hyperscan seems really great!

        A quick search and I found a solid patchset for Ripgrep (a Rust-based alternative to Grep known to be very fast) which enables Hyperscan integration: https://git.sr.ht/~pierrenn/ripgrep

        Apparently the maintenance burden of maintaining support for the integration upstream was too much for Ripgrep’s maintainer, but he’s worked with the maintainer of the patchset to make things easier for them to maintain it instead.

      2. 2

        Your link is broken. You need to add www. FTFY: Hyperscan

        1. 2

          Thank you, fixed.

    2. 1

      I’m very curious how the performance of more complex higher level operations compares between hand optimized SIMD targeting each platform independently and auto-translating individual instructions. The VOLK library contains hand optimized operations for a variety of digital signal processing operations. It has a generic C implementation for every operation “kernel”, and some specialized “protokernels” targeting SSE{2,3,4}, AVX{2,512}, NEON, etc. But most kernels are missing some of the architectures. Maybe SIMDe could be used to fill in the gaps with a potentially faster version by auto-translating existing kernels. VOLK profiles all the available options automatically so if they were slower it wouldn’t negatively affect the runtime results. https://www.libvolk.org/