1. 64
  1.  

    1. 4

      I’m not sure hosting mouthpieces press releases here is a good precedent. Tends to get abused very quickly. Pieces like this lack any sort of technical content and are mainly just a checkbox on a PR disaster-management plan.

      A better read is over in the mailing list.

      1. 4

        Seems like this is not an ARM or an AMD bug. If so, good news for them and a second even bigger wakeup call for Intel after the management processor debacle.

        1. 2

          How do you judge ARM unaffected? I saw the patch regarding AMD but there is a diff regarding ARM floating around that could be tied to this: https://lwn.net/Articles/740393/

          1. 1

            It sounds like ARM is affected, but the impact is not as severe: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-November/542751.html

            Their benchmarks say that syscalls roughly doubled in cost, but unlike the Intel bug, the cache remains intact. The Intel bug is particularly bad because the page cache has to be fully flushed on each userspace/kernel transition.

            1. 3

              A bit nitpicky, but my read of that is that the bug itself is equally present on ARM as on Intel (unlike AMD, which isn’t affected), but due to ARM’s virtual memory design it’s possible to implement the workaround (PTI) with less of a performance hit. Which is a better outcome for ARM, but more like luck than better QA, since those architectural features on ARM weren’t designed for the purpose of implementing something like PTI, they just happen to be useful for it.

              1. 1

                Ah, you’re right, where I said “bug” I meant “bugfix”.

        2. 3

          I’m sure this is a minority opinion, but it would be nice if it were easy to opt-out of these changes.

          For my home machines I’m not concerned about the security risk, and would rather have the better performance.

          1. 5

            It looks like the pti=off flag should get the old behavior back.

            1. 2

              I’m not concerned about the security risk

              we don’t yet know what are the security risks.

              1. 7

                Shared computers are more shared. :)

                1. 1

                  Well, we know it involves user processes reading kernel memory, and I’m confident that I’m not running any malicious user processes that are attempting to do so.

                  And the real issue is almost certainly not as bad as the scare mongering in The Register’s article.

              2. 3

                Time to rewrite all our programs to drastically reduce the number of system calls they make. Not to make the security problem go away, but to shrink the performance impact of the workaround for it. :)

                1. 3

                  The main piece of code I work on for work exports stats to a shared memory segment that we can see in the UI. One of the most important stats is “avgcommit” - the number of units written per syscall. It is, by far, the most important performance statistic we have.

                  1. 1

                    Cool! If you’re looking closely at that, are you getting into the kind of territory where you might want to be looking at the storage equivalents of DPDK’s approach? By that I mean an approach like driving iSCSI or FC HBAs or NVMe controllers directly from userspace instead of via a kernel filesystem. I think that https://software.intel.com/en-us/articles/introduction-to-the-storage-performance-development-kit-spdk is the kind of thing I’m thinking of.

                    1. 1

                      We’ve looked into similar things, but the limitations on what hardware we can use and how we interact with legacy systems means that it’s basically a non-starter. Instead we do some cleverness with how we write both data and metadata, and end up writing about 250-300 units and their metadata per syscall (the original system, written before I got here, was one syscall per unit and one syscall per metadata chunk).

                      The 250-300 units metric is the speed that we’re receiving things, so we’re operating at speed. I’ve got some ideas on how to speed things up further, but they’re radical departures from what we’re doing now, so much so that it would be essentially a complete rewrite of the subsystem.

                  2. 3

                    system calls are already ridiculously expensive.

                    1. 2

                      Good thing I’ve got a one year head start. :)

                      1. 1

                        What, pledge()? I thought that was more of a restriction of variety rather than frequency. ;)

                        1. 3

                          No, just running ktrace and asking “why is this program being stupid?”

                    2. 1

                      Seems to be fixed in macOS 10.13.2, too, with some refinements(?) coming in 10.13.3:

                      https://twitter.com/aionescu/status/948609809540046849