1. 6

  2. 5

    I’d love to see an explanation of what those SIMD instructions mean and how they achieve what they achieve … It’s interesting, but some more information would help.

    From looking at the code on Github it seems like it is comparing a vector of 8 values at a time newval against itself shifted by 1 position vecTmp, and then turning that into a binary mask M used to offset into a table uniqshuf which tells it how to rearrange the numbers into a shorter, deduplicated vector. Again, some more time looking at the tradeoffs here would be interesting …