1. 9

  2. 8

    The measurement turned out to be a mistake, the author posts on Twitter:

    My blog post was badly wrong. I should have written:

    1. The Apple M1 is a match for Intel/AMD AVX2.
    2. Apple’s x64 emulation layer is so good that I mistakenly benchmarked under it without realizing it.

    Lessons learned.


    1. 1

      The current measurements are correct, I’ve checked them on my own M1 (although I haven’t checked the intel ones). The issue originally, as discovered by several people on twitter, and myself on the HN thread, was twofold:

      1. The author accidentally ran the benchmarks under Rosetta
      2. The ARM64 NEON optimised code wasn’t getting compiled in (SIMDJSON_IMPLEMENTATION_ARM64 wasn’t getting #define’d)

      Once both of these were addressed, the code sped up significantly. This story really is a testament to how good Rosetta is. It’s seamless from the OS-side, and it’s fast. If FatELF had have been merged, it wouldn’t be too hard to implement something just as seamless under Linux using binfmt_misc. But it wouldn’t be as performant on x86 multithreaded code, thanks to Apple’s hardware support for the memory model.

    2. 4

      I’m already tired by all those articles about Apples new chip. 🤷‍♂️ Are people now hyping every new chip as long as it is coming from Apple? Will anyone say a word if the next generation of Intel chips beats the M1 except of course “Apple will surely bring out the M2 soon”? And when do people finally stop to try to bend their needs towards apple products? How many complaints did I hear years ago about how docker is shit on macbooks and how many people celebrated the fixes to those issues that are now suddenly “ok without docker and surely Apple will fix this issue in the next generation”? [/rant]

      1. 6

        (Disclaimer: I’m a Mac user, don’t have a M1 Mac, don’t know when I will have.)

        I think there’s a worthwhile secondary point to this hype: it’s not just how great the new chips are, it’s also about how slow the progress of x86 (and especially Intel) has been. Apple is showing what’s possible right now; people buying non-Apple products should be demanding similar performance from their suppliers.

        1. 3

          Probably because Apple’s the only one:

          1. Making chips that are fast. Intel and Qualcomm have been stagnating and we’re celebrating AMD for catching up. As soon as A7 came out, people drew comparisons to “shit, this is fast as Haswell-U”, and then the rumour mill started. From them on it’s been basically just linear improvements every year like clockwork - something Intel hasn’t done in a while. We’d be singing a different tune if Intel actually was improving their CPUs.
          2. Making chips in devices that people can just go out and buy, and are likely to use. Yeah, Neoverse might be nice, but it’s a lot harder to get your hands on it other than “mmm rent from AWS”. This is something you can walk up to the store and buy in a phone/tablet/laptop - one of the most popular makes, at that.

          Disclaimer: I bought a new MBA, and I wouldn’t have otherwise if it weren’t for M1. I know a lot of people in the same boat (buying Macs when they wouldn’t have otherwise) because it’s that significant of an event. If they aren’t buying now, they will next year when the other models get refreshed

          1. 2

            I may be incorrect, obviously, but it seems the point is that the next round of Intel chips won’t compare IF Apple ARM can continue increasing core count, with excellent compilers.

          2. 2

            This is absolutely insane. I made a comment on the author’s last article here because I had no idea that ARM even did SIMD, the fact that it benchmarks that close to a 3+yr old intel chip is impressive.

            1. 2

              I had no idea that ARM even did SIMD

              Arm has more advanced SIMD than x86 at this point. NEON (which replaced VFP) has been part of Arm since ARMv7 (the Cortex-A8, 2005). Scalable Vector Extensions (SVE) have been in the Arm spec for a couple of years. These were co-developed by Arm and Fujitsu, for Fujitsu’s supercomputers. They support vector sizes from 128 to 2048 bytes, which can be dynamically discovered, so the compiler can emit vectorised loops with the stripe size depending on the target that they run on. On x86, you need a recompile to go from SSE to AVX or AVX-512. With SVE, you compile once and can deploy on any vector size hardware, from a phone with a 128-bit vector unit to a current supercomputer with a 512-bit vector unit or a future one with 1024- or 2048-bit vectors.