1. 54

  2. 8

    AMD’s CPU division has really upped its game in recent years, particularly in terms of bang/buck. I’d love to see their GPU division do the same, especially given that, unlike Nvidia, they’ve open-sourced and mainlined their graphics drivers.

    1. 3

      I recently switched to AMD GPU’s for gaming, after using NVidia basically my whole life. The cost/performance curve is pretty similar all in all, and they work great on Linux; the main difference from NVidia from what I can see is that the release cycle is a bit slower and high end doesn’t go as high.

      Unfortunately, high end and ML is where tons of money lies, and that’s the market NVidia has locked in pretty well.

    2. 5

      Unfortunately, the comparison is between Intel Skylake SP (2018) and AMD Rome (2019) systems. Comparing Rome with Cascade Lake SP (2019) would be a bit more representative for deciding on what system to build right now. (Though I suspect the conclusion would not be that different as Cascade Lake only has some minor microarchitecture and fab process tweaks vs Skylake.)

      1. 4

        Interesting to me: ARM is still a footnote in this writeup, but it appears the pain train is coming for both Intel and AMD in the datacenter. Amazon has the only chips where this is obvious so far (https://perspectives.mvdirona.com/2020/01/aws-graviton2/) but you gotta imagine that comparable chips will be widely available in the next couple years and it’ll be tough for x86 to keep up.

        1. 2

          Ampere’s Quicksilver is coming this year, that’s probably gonna be the real “Graviton2 if you’re not called Jeff Bezos” :)

          For now there’s only the first gen Ampere eMAG which is not powerful enough (though has an absolute ton of I/O at a relatively low cost), and the HPC-oriented Marvell ThunderX2 which is too expensive for general server use.

          1. 2

            Interesting, thanks for the info. Do you see any road for x86 to be competitive even in the mid term with some of these new ARM chips? It seems between the architecture and production volume advantages it’s going to be really tough.

            Side note: I wonder if a viable branch prediction attack mitigation would be to just give everyone a dedicated ARM machine. If they were cheap enough you might not even need to virtualize.

            1. 2

              So far it’s been AMD who have all the advantages. Many people became really skeptical of the ARM servers when they saw EPYC Rome. But if Amazon is going all in, producing custom ones… there’s something there. I do hope that Ampere delivers something great this year.

              I wonder if a viable branch prediction attack mitigation would be to just give everyone a dedicated ARM machine

              Scaleway did that with some 32-bit Marvell thing a while ago. It is kinda interesting, but eliminates the flexibility of VMs where any VM can have any number of cores from 1 to however many the CPU has.

        2. 2

          Does anyone know what would’ve happened if spectre and all of the other issues hadn’t come to light? Does that even affect the outcome here?

          1. 5

            I doubt the bottom line requests/Joule figure for the Intel system would increase by anywhere near 25% if you disabled all the vulnerability mitigations. The effect is measurable, but not that big on most realistic workloads. Intel perhaps would have pulled ahead on some of the microbenchmarks that were tight. Also bear in mind that AMD’s CPUs were also affected by some of the issues, so those results include a partial mitigation penalty too.

            1. 5

              I don’t know about that. People saw a wide range of impacts, from negligible to 50%, and everything in between. I recall 10–15% was a common range. I’m not sure what impact it has on Cloudflare’s workload, but I can easily imagine that lacking the mitigations, it would no longer have been such a clear victory for AMD.

              1. 2

                “Realistic workloads” usually don’t include a ton of context switches which I expect CloudFlare’a gear does a lot of.

                1. 2

                  CloudFlare’s team have mentioned that they use a lot of eBPF.

                  1. 2

                    That’s just for filtering, though. I’d expect a ton of context switches on a heavily loaded cache node constantly moving data between user and kernel space so the bits can go from disk/application memory to the kernel and through the network stack. It’s not like they’ve moved Nginx & friends into the kernel.