1. 41
  1. 12

    (not trolling) I expected this to be an AWS EC2 instance or something else cloud-based, due to the 2022 in the title.

    I recently ran into an issue where I had a few hundred GB of compressed, encrypted data that was destroying my patience. In the time it took my daily driver to do part of the decompression, I was able to spin up a 32-core, 64GB RAM instance, pull the data down, unpack everything, and run my analytics across it before downloading the results and burning the EC2 instance to the ground. I’ve never been a fan of someone else’s computer being my compute cycles, but that experience definitely left me considering how and where I needed to own the hardware, instead of letting someone else invest in it and just borrowing it for a little bit of time.

    I am certainly not upset it isn’t cloud-based, though. I’m always a fan of good cable management and those pictures are gorgeous!

    1. 2

      Cloud computing can be very useful like that, I just find it rather inconvenient.

      I have https://blog.nelhage.com/post/distributed-builds-for-everyone/ on my list of things to try, though, so perhaps that changes my take on the subject :)

      1. 3

        Thanks for the link! I read his post about building LLVM in 90 seconds and that was partly what got me thinking about trying out AWS to skip steps when my local bandwidth and memory were the limitations.

        Honestly, it also made me eyeball your benchmarks in the article and ponder if there was a way to cheat and beat those numbers with AWS. More specifically, how much would it cost to be able to beat those numbers. My suspicion is the overhead would take too long (since it only seems fair to include the amount of time to get the instance running) for that to be viable.

        1. 1

          The cloud has a long way to go for ease of deployment but there is some indication that it’s getting there. There’s a gradual trend from VMs to containers[1] to FaaS. Each of these is making it easier to deploy small things. It’s still a lot harder to deploy a cloud ‘hello world’ than a local one and there’s also a lot of friction between cloud and local development. My bet for the next decade would be that we see a lot more unikernel-like development environments where the toolchains include cloud deployment as part of the build infrastructure, so you can develop code that runs locally on your OS, locally in a separate VM, or in your favourite cloud provider’s infrastructure.

          [1] Container is an overloaded term. I mean it here as a unit of distribution / deployment, it is often also conflated with shared-kernel virtualisation. Cloud container deployments typically don’t use shared kernel (or, if they do, share only within a single tenant) for security reasons. The benefit of containers is that you can easily build a very simple image for your VM.

      2. 2

        For some reason, the transfer was super slow. Last time I transferred the contents of a Samsung 960 Pro to a Samsung 970 Pro, it took only 16 minutes. But this time, copying the Force MP600 to a WD Black SN850 took many hours!

        I don’t really think that the slow transfer rate had something to do with the 5M block size you used with dd but a simple

        $ cat /dev/disk/by-id/nvme-Force_MP600_<TAB> | pv > /dev/disk/by-id/nvme-WD_BLACK_SN850_2TB_<TAB>

        should have done the trick as well. I am wondering if this would have been faster?

        1. 1

          That system would fly with a BSD. I’d love to see some benchmarks between FreeBSD and ${LINUX_DISTRO_OF_CHOICE) on that system.

          1. 5

            While not the gen 12, here’s a recent BSD versus Linux comparison. A quick skim of the benchmarks leaves me with the impression that FreeBSD isn’t any faster than Linux.


            1. 18

              Because it’s not. Once upon a time BSD had better SMP scheduling and more scalable kernel structures than Linux but that time is long gone. Hyperscalers like Google run Linux on 100+ core multi-socket machines, and have optimized the hell out of it.

              1. 7

                Perhaps 15-20 years ago FreeBSD used to be faster than Linux generally, and it hasn’t really been since – except for some specific applications. The momentum is just too much of an advantage for Linux.

                1. 1

                  Immense code churn is not always a good thing. Properly engineered code churn is. :-)

                  I think you’ll find that the approaches taken by the BSD communities take a more academic, engineered approach, which tends toward stability, scalability, and robustness.

                  Speed of microbenchmarks (like the Photonic test suite) should not be the sole benchmark. How well a system maintains state, stability, and performance over long periods of time can be an extremely useful benchmark (in addition to other factors).

                  I think today’s headline-based benchmarks miss the mark in how systems truly perform. They focus on quick points of minutiae and completely ignore the long term in hopes of making headlines.

                  What theinux community gets right, though, is marketing. A lot of the “how to do this on Linux” articles I read apply to the BSDs as well. That causes people to incorrectly believe that there’s less momentum in the BSD camp.

                  1. 8

                    This is a completely different point than what you raised originally. You explicitly asked for comparison benchmarks, and when they don’t show BSD being faster you shift to discussing stability and dismiss benchmarks.

                    Personally I don’t consider an academic approach to necessarily be the best, but I also see value in both of the approaches of BSD and Linux. I do think it’s disingenuous to imply that BSD is “more engineered” than Linux.

                    1. 1

                      This is a completely different point than what you raised originally. You explicitly asked for comparison benchmarks, and when they don’t show BSD being faster you shift to discussing stability and dismiss benchmarks.

                      I see how it might seem as though I switched subjects slightly. I take a little bit of a different view on what comprises real-world benchmarks and I should’ve been a bit clearer on that. I apologize for the misunderstanding and my lack of clarity.

                      As a somewhat related tangent, at my previous dayjob, I helped maintain a Linux-based big data cluster. In my spare time, out of curiosity, I ported the big data cluster to being based on HardenedBSD on the more limited hardware I had at the time.

                      With the hardware we had for the Linux-based cluster (which had a freakton more resources than my HBSD-based infrastructure), we still had to reboot certain systems every few days due to stability issues in Linux. To contrast, my HBSD-based infra could ingest the same amount of data and I left it running for a few months without rebooting.

                      I think these kinds of real-world scenarios should be a part of any pertinent benchmarking paradigms. Running ffmpeg or gcc doesn’t really tell me what to expect under real-world workloads.

                      I recognize the entirety of the example I gave is subjective and anecdotal, but I’m thinking that the example can be taken in a more general context.

            2. 1

              I tried on my system (which isn’t THAT old) and it took 48s to compile Go :(. It’s mostly a single-core build, so having more cores doesn’t really make a difference, and the latest Intel cores aren’t that much faster. I would guess that WD Black SSD is most of the difference – I have a random off-brand SSD.

              1. 5

                40s on an M1 Pro Mac.

                1. 2

                  ./make.bash 278.53s user 44.97s system 374% cpu 1:26.27 total

                  45s on a battery powered and fanless Macbook Air. The M1 ARM processors are the most notable performance boost since SSDs made their way into personal computers.

                  1. 3

                    If that is time output, then it took 1 minute and 26 seconds in total. 45s is the time spent in the kernel for the process (e.g. for handling system calls).

                    1. 3

                      🙈 You’re absolutely correct, I was confused by the time output. On my Linux machine time has a different output in bash:

                      real  0m52.402s
                      user  4m8.435s
                      sys  0m18.317s

                      which I prefer. Anyways, the M1 machine is plenty fast for my use case.

                      1. 2

                        I was confused by the time output.

                        Happens :), especially with the different orders.

                        Anyways, the M1 machine is plenty fast for my use case.

                        I also had an MacBook Air M1 prior to the MacBook Pro 14”. The M1 Air is an awesome machine for development. And I’d say for most people who don’t need >16 GB RAM, lot of cores, or many displays, the MacBook Air M1 is the better machine.

                    2. 1

                      I’m not sure if the ARM and x86 compilers are really doing the same work unfortunately.

                      1. 1

                        If people are just compiling for the native target, it will be a bit different at the end, yeah. But typically little time is spent in the backend so it doesn’t matter that much. But this is go which has much less stuff going on in the frontend/middle than something llvm based.

                    3. 2

                      I also get 40s on a M1 Pro and 35s on a Ryzen 5900X.

                      I am still amazed that my laptop that I can put in my backpack, with fans that barely spin up at all, is close to a 105W TDP Ryzen 5900X. Also not even that far away from the Intel Core i9-12900K from the article that has base power use of 125W and maximum turbo power use of 241W.

                      My default is now to do all development on the M1 Pro, including training basic machine learning models (the AMX matrix multiplication co-processors make training small networks pretty fast). I only use a headless GPU machine for training/finetuning large models.

                      1. 1

                        I get similar timings on my Ryzen 5600X: 39-40 seconds. My setup is optimized for noise and size though (it’s a SFFPC), and the CPU is deliberately set to limit the temperature to 75C. This way I can keep the fans super quiet, with their speed only increasing when there is prolonged load. I think I also slightly undervolted the CPU, but I’m not entirely sure.

                        I did experiment with faster fan speeds, but found it didn’t matter much unless I really crank them up. Even then we’re talking about a difference of maybe a few degrees Celsius. IIRC the Ryzen CPUs simply run a little on the hotter end when under load.

                        The M1 is definitely an impressive CPU, and I really hope we start to see some more diversity in the CPU landscape over time.

                    4. 2

                      49s on a Ryzen 9 3900X and 3x Samsung 860 EVO (built in 2020). But honestly no complaints, that’s not bad for a whole damn compiler and language runtime.

                      1. 1

                        It should go faster. Was that with PBO, manual OC or stock? What RAM settings?

                        1. 1

                          RAM is 3200MHz CL16 with XMP settings, don’t remember how I’m fixed for OC but I didn’t push it, these days I’d rather have it work 100% of the time than have it go 5% faster but crash once a week :)

                        2. 1

                          I had a 3700X before. If your mainboard supports it, it’s worth considering upgrading to a 5900X some time. E.g. it builds Go in 35s, so it’s a nice speedup.

                          1. 1

                            I’m wondering if it’s time to replace my TR2950x

                            ./make.bash 381.03s user 41.99s system 632% cpu 1:06.84 total

                            1. 1

                              I guess my machine was not fully optimized.

                              ./make.bash 299.47s user 28.67s system 646% cpu 50.781 total

                          2. 1

                            It’s not really single-core and storage doesn’t have that much of an impact.

                            NFS share: ./make.bash 193.40s user 36.32s system 690% cpu 33.273 total

                            in-memory tmpfs: ./make.bash 190.17s user 35.55s system 708% cpu 31.843 total

                            (this is not exactly apples-to-oranges, I’m building 1.17.6 with 1.17.4; this is on a 5950X with PBO on and RAM slightly above XMP, and an OS that doesn’t support CPPC; oh and the caches are probably pretty warm for NFS since I’ve extracted the archive on the same client machine)

                            1. 1

                              It’s mostly a single-core build, so having more cores doesn’t really make a difference, and the latest Intel cores aren’t that much faster.

                              Here is a CPU profile for 30s of the ./make.bash build:


                              There are lot of concurrent regions during the build, so I definitely wouldn’t say it’s mostly single-core.