1. 22
  1. 13

    I think this post has low to zero value because it’s only hand wavy and relative but doesn’t tell us much. Could’ve been a tweet.

    1. 7

      I’m glad it’s not a tweet. Web pages are better than tweets. It contains a few useful facts, and some background about the author’s use case. Not everything needs to be a massive slog to have value, and any short article that exists will have more value as a web page than a tweet.

      1. 3

        The math in comparing the 8 core M1 vs 64 core Graviton and claiming the latter is 3 times faster is…interesting. Yes, it took a third of the time to run the same tests on a machine with 3 times as many cores, but that is not the same thing as being three times faster…

      2. 5

        I wonder how much of this is down to the fairly insane per-core memory bandwidth on the M1s. I guess some folks at Apple are still suffering from memories of the G4, which was much faster than anything else on the market at the time, except for all of the time that it spent idle waiting for data from memory. With the shared memory between CPU and GPU, I wonder how this differs if you’re also doing something memory-intensive on the GPU.

        I’m still very interested to see if Apple will use their own cores in the Mac Pro series. I’d imagine that their Arm license makes it easy for them to license something like the Neoverse N2, which is probably a bit slower clock-for-clock, but scales to dual-socket configurations with 64 cores per socket.

        1. 2

          Why would Apple want to use inferior cores in their flagship product? I’m sure they are perfectly capable of “scaling” their interconnect the same number of cores as anyone else.

          1. 6

            A high-performance CPU design is tightly coupled with the design of the memory hierarchy. When you’re building a NoC design and cache coherency protocol, there are a bunch of tradeoffs that you can make in terms of latency versus scalability. An interconnect designed for up to about 16 cores will look quite different from one that is optimised for 64-128 cores. A CPU that wants to achieve peak performance in such an SoC will bake assumptions about this design into its operation. This is particularly important for things like TSO emulation, where being able to assume the maximum latency of a cache-coherency mechanism helps you scale other components.

            From the performance numbers and characteristics, I believe the M1 is likely to be quite aggressively tuned for the current memory controller and cache design. Scaling that up to 128 cores would require a lot of redesign of the non-core bits of the SoC and this would have knock-on implications on the cores and require a lot of tuning to achieve the same performance per core that they get today.

            The Mac Pro might be their ‘flagship product’ but it’s also their lowest-volume product. They don’t provide breakdowns, but I recall one analyst extrapolating that they have something like a 10:1 ratio of iPhone:MacBook sales and 100:1 MacBook:Mac Pro sales. The current M1 design scales nicely from high-end smartphones up to high-end laptops. Tuning it for a Mac Pro would probably cost a tenth as much as a complete redesign but for a thousandth as many sales. That’s difficult to justify from a purely business standpoint - the NRE would probably kill it unless it’s funded out of marketing budget like the XServe line (which existed purely so Michael Dell couldn’t keep telling people that Steve Jobs bought Dell servers for Pixar because Macs weren’t fast enough). Putting their GPUs and SE in an SoC (possible even as separate chiplets) with a 64-core Neoverse V1 cluster would still significantly outperform their fastest laptop chips, for a fraction of the cost.

            1. 1

              up to high-end laptops

              And now up to high-end desktops that glue two of the chips together.

              I wouldn’t be surprised if the plan for M2 is something like: more of everything across the board, support interconnecting 4 dies, boom you have enough power for a Mac Pro.

        2. 3

          I can’t wait for other providers to add ARM support too. M1 is a really good CPU, and I’d rather develop and deploy on the same architecture.

          1. 3

            Yep, when Scaleway started deploying ARM cores, it looked pretty exciting for a while. I don’t know why the abandoned it though. Too expensive? Or not enough demand?

            1. 7

              I think Scaleway was a bit ahead of it’s time. Actual demand was not that big, their first-gen ARM servers were comparable to juiced-up RPis, the second gen was running on ThunderX and was for the time fairly good, but by the time for a third generations of servers, ThunderX line was already dying, courtesy of Marvell, and generally ARM on servers situation was looking a bit dire at the moment. Even though ARM announced it’s N1 core a year ago, there were no CPU’s with it on the market. AWS announced it’s Graviton 2 CPU’s a couple of months earlier, but no outside evaluations of it’s performance exist until AWS made instances with them available a month after the Scaleway’s announcement to discontinue ARM instances. Ampere Altra CPUs were announced a couple of weeks after Scaleway’s announcement, but the decision was already made.

              1. 1

                ThunderX1 was kind of a mess. It was the very first real SBSA ish platform, with a loooot of quirks because of that. And the cores were slower than Cortex-A72.

                Altra CPUs were announced a couple of weeks after Scaleway’s announcement

                For quite some time before the Altras, Ampere was selling their first gen product (eMAG). These were available as dedicated servers on Packet (now Equinix Metal) but are discontinued there now.

                1. 1

                  For quite some time before the Altras, Ampere was selling their first gen product (eMAG).

                  Yeah, but eMAG was simply worse than ThunderX2, even with all of it’s quirks. And even ThunderX2 was outdated for a new product at that time.

                  1. 1

                    Very different markets. TX2 was an ultra expensive HPC-oriented chip. eMAG was the cheapest cloud-oriented one.