1. 53

  2. 15

    Wow. That hash-based bisection technique is really something.

    Part of the reason “it’s never the compiler” is that it takes a lot of expertise to even come up with a decisive test for whether it’s the compiler!

    1. 7

      excellent work, you can really see all the hard earned experience that goes into solving a problem like this.

      1. 6

        Fascinating read, including the RAM heating aside!

        I’m curious about the implications of that racy ORQ stack probe though. If that’s actually not a no-op due to reading and writing back memory, isn’t that still a likely cause of even more obscure bugs? Say for a concurrent program sharing its stack space between multiple thread. Could the GCC probe be done in a safer (or more obviously safe) way?

        EDIT: The LKML thread goes into the details a bit further: https://lkml.org/lkml/2017/11/10/188

        1. 3

          It shouldn’t cause any bugs, the mitigation works by writing to each page that’s beyond your current end of stack, i.e. uninitialized memory, up to the amount of stack your function needs. It’s trying to hit the end of stack guard page. Lots of good details in this stack clash exploit write up.

          1. 2

            Say for a concurrent program sharing its stack space between multiple thread

            If I understand right, it’s okay because it’s probing beyond the end of the stack. I’m not allowed to use a region on my stack beyond the current stack frame at all for anything (on this thread or any other) at all without invoking UB. With the stack protection scheme that is in use there, it must be mandatory to have guard pages at the ends of stacks, so the end of a stack is never close enough to another data structure to be in danger.

          2. 4

            That is a crazy bug. So I wasn’t wrong to be suspicious of M:N threading? I understand why Go went that route, but the complexity is crazy.

            Here is another example of a years-long bug related to M:N threading (which I saved years ago):


            1. 1

              the only runtime that gets this right is Erlang/OTP’s BEAM, with pre-emptive concurrency.

              1. 1

                I’d argue that the bug is having direct calls between code produced by two different compilers that disagree about what the ABI is, rather than it being m:n threading itself that’s the culprit. You could hypothetically want really tiny stacks for some other reason than m:n threading and get the same bug?

              Stories with similar links:

              1. Debugging an evil Go runtime bug (2017) via eloy 2 months ago | 7 points | 1 comment