1. 47
  1.  

  2. 8

    Huh. Neat. Definitely feels faster as a user, I’ve ended up working from an M1 MacBook Air instead of an intel MacBook Pro that on paper should be far more powerful spec-wise.

    1. 8

      Same, I love my M1 Air. I wish my work would’ve issued me one instead of this Intel junker.

      an Intel MacBook Pro that on paper should be far more powerful spec-wise

      Intel Macs aren’t faster on paper either. Benchmarks have repeatedly shown the M1 performance cores are faster than comparable Intel cores. Of course these QoS optimizations help improve user-interactive latency and reduce power usage of background tasks. But all existing evidence suggests M1 would be faster even with the efficiency cores completely disabled.

      And it’s not like M1 is special magic that blows everything else out of the water. AMD Zen2/3 (x86) and AWS Graviton2 (ARM) server processors outperform comparable Intels by embarrassing margins. Since Intel has dropped the ball for the last several years, and the previous gen Macs used Intel, M1 looks like a huge leap forward. The difference wouldn’t be so pronounced if the previous MacBook Pro used an AMD Zen2 CPU.

      1. 1

        Intel Macs aren’t faster on paper either. Benchmarks have repeatedly shown the M1 performance cores are faster than comparable Intel cores. Of course these QoS optimizations help improve user-interactive latency and reduce power usage of background tasks. But all existing evidence suggests M1 would be faster even with the efficiency cores completely disabled.

        Hm, fair. I think I meant more “given the intel machine cost 3x the M1 MBA” then. It has more memory, bigger SSD too although that doesn’t have much (any?) bearing on CPU performance admittedly.

    2. 4

      Because Macs with Intel processors can’t segregate their tasks onto different cores in the same way, when macOS starts to choke on something it affects user processes too.

      With modern multi-core CPUs, it seems natural to pin threads/processes to different cores. It would certainly speedup high priority tasks when the on-chip L2 caches are separated. Why doesn’t macOS do that on Intel CPUs?

      1. 3

        With modern multi-core CPUs, it seems natural to pin threads/processes to different cores.

        You might be surprised. When you migrate a thread to a different core, that thread will experience a load of L1 cache misses, but snooping from another core’s cache is pretty cheap and so you take a very small slowdown until the L1 is warm and then none. If you leave a thread pinned to a core, that core develops hot spots and is thermally throttled but unless your system is fully loaded then other cores will be cooler. Last time I benchmarked this, turning off CPU affinity made things faster for anything except very homogeneous workloads (even with full CPU usage, if you have a diverse instruction mix between programs then they may warm up different bits of a core and so may be cooler overall if you swap them around a bit). Note that this is not true for multi-socket machines, where operating systems try to allocate memory from directly attached memory controllers and so DRAM latency goes up if you migrate a thread to another socket.

        1. 1

          Wouldn’t the M1 experience the same thermal issue, as the 4+4 cores are on the same chip? So the difference is that the M1 is able to throttle its cores separately but Intel chips throttle uniformly? I imagine pinning processes to separate sockets/NUMA domains would help a lot.

      2. 4

        I recently got an 8gb m1 pro for work and was extremely skeptical of the tiny amount of memory but have been consistently blown away with the performance and responsiveness of the machine, super interesting read digging into how that works beyond raw performance.

        1. 2

          I’d be surprised if the background tasks weren’t being run with a high niceness (low priority) on Intel cores too?

          I guess the downside of using full power cores for background jobs is that, even though the scheduler makes them yield time slices as soon as a high priority job shows up, they still heat up your MacBook so it’s already thermally throttling by the time you start your high priority task.

          1. 2

            There seems to be a few benefits to this:

            • rogue background processes are limited to half (if not lower than half because of “lower power cores”) of the entire cpu’s time (mdworker is a great example I myself have had that thing max out cores)
            • you nearly will always have 4 cores ready to pick up jobs that the user wants done vs having to wait for context switching and threads to get out of your way
            • as you mentioned the computer is already half as hot (or lower) than it would have been if all of the cores were doing things
            • potentially with 4 cores limited to background jobs that might mean even less time slicing for UI jobs which definitely will make it faster if it is nearly the only thing using “that” core

            for a gui environment this seems like a really good idea honestly. You likely could achieve the same feel with an Intel if you were able to assign half of the cores to be only for background jobs as they have.

            1. 3

              (mdworker is a great example I myself have had that thing max out cores)

              Fuck, I hate the entire concept of desktop search so much.

              I have never, ever seen a desktop search system work. I type in a query that should match the content of a recently used document. No results. I type in something that should match its filename. Also no results. I curse the gods and give up and find it manually or grep or something.

              I’ve had exactly this experience with the desktop search systems that come with Windows 10, with Mac OS, with KDE and I can’t remember if Gnome even has one but if it has I haven’t seen it succeed.

              So much fucking electricity wasted and literally zero occasions where the fucking things have ever been useful.

              The only desktop search applications I’ve ever seen work are locate (mlocate) and the simple slow reliable one that came with Windows 9x which didn’t index anything in advance, it just searched the whole disk on demand. But that one was actually useful.

          2. 2

            I’m wondering what will happen with ‘security’ software. If you can’t use a file until an ‘on access’ scanner has checked it, would you want that file to be scanned by a fast core? This would mean the machine heats up and the fast core is busy doing something other than helping with interactivity / compiling / etc.

            1. 1

              I guess ideally it’d inherit the performance profile of the thread it’s being opened by. I imagine it doesn’t happen that way though. Most commercial on-access scanners I’ve seen don’t even offer Apple Silicon versions currently.

            2. 1

              Linux users:

              echo 1 | sudo tee /sys/devices/system/cpu/cpureq/ondemand/ignore_nice_load
              

              This takes care of the cpu frequency aspect of it. I’m still envious of those efficiency cores, though.

              1. 1

                Do background jobs run with a low niceness in Linux though? It’s my impression that niceness is woefully underused in the Linux world.

                1. 1

                  Good question.

                  rg -li nice= /usr/lib/systemd/system/ | wc -l
                  6
                  

                  It is used. It’s just that they aren’t running most of the time (things like log rotation). The “nice” cpu load is shown in blue in Htop, but in my experience, you never see it unless you are running a cpu hog like Boinc.