1. 30

More complaining about undefined behavior (UB) and so-called “optimizations”.

  1.  

  2. 17

    I’m not convinced “somebody once wrote code that did this” whether that be openbsd or anyone, is actually a solid argument that it’s wise. People also wrote (continue to write) all sorts of bullshit code that assumes int and long are particular sizes, and this code “works perfectly” on 32 bit platforms, but fails spectacularly on 64 bit platforms. We have no trouble identifying 64 bit unsafe code as wrong, but it’s exactly the same. The expert wizard programmer, smarter than the compiler and the silly C standards weenies, wrote some code just the way they like it and therefore it must be correct.

    1. 0

      nobody argued it was wise. However, the proposal is to change the standard to make working code, however unwise, silently change behavior. To me, “we’re going to silently break widely used crypto code” is indefensible. Want to do everything “right”? Write a new language.

      1. 5

        I’d really like to see that widely used crypto code that will break.

        Where is it and who uses it?

        1. 4

          People don’t, not for remotely decent crypto. @vyodaiken is just wrong. Even they were right, there are always ways to get around these optimizations. For an actual real example, consider explicit_bzero, which ensures the zeroing won’t be optimized out.

          1. -1

            I quoted my source - look it up and explain your reasoning. And “ there are always ways to get around these optimizations” is hardly a defense. The programmer should be addressing the application, not grappling with needs to “get around” optimizations that don’t actually optimize the code.

            1. 2

              Did you actually read the source and its references?

              Your quote is referring to a five-year-old blog post that describes a problem in a never used fallback path of the seeding code for a deterministic prng nobody should’ve been using for crypto in a long time. A problem that OpenBSD for instance fixed 11 years ago. And even then better APIs had been around for a long time.

              How does any of that support the notion that widely used crypto is about to break silently?

              1. 0

                yes i did.

    2. 15

      Why would you want to use uninitialized memory for entropy when arc4random exists? To me it seems entirely pointless to use undefined behavior for acquiring low-quality entropy when the system provides a much better source already.

      1. 2

        The implication is that on some BSD and open ssl implementations, they use unitialized memory in the arc4random code.

        In any event, to me, the question is why would you want to break working code?

        1. 10

          The standards committee would argue that the code has always been broken, it’s just “worked” because compilers haven’t been as aggressive as they could be.

          From a compiler perspective, it’s hard to distinguish between “this should obviously work” and undefined code that can be eliminated for a legitimate performance win. In this example, an expression optimizer that previously deleted all effects with indeterminate args now needs to prove that the expression doesn’t reduce to a constant. The alternative would be to remove the indeterminacy check entirely, which would be a non-starter for teams with a “no performance regressions” policy. Remember that this sort of thing doesn’t exist just to break people’s “obvious” code: you can save on code size by removing impossible branches of a function after inlining, for example.

          Also, if this were to be “fixed” in the standard, what would the fix be? You could say that an expression is determinate if, for all possible values for the indeterminate inputs, the output is the same, but this raises questions about value range (e.g. if int a = indeterminate & 1;, what can you say about a?) You could replace the concept of “indeterminate” with value ranges where an uninitialized value has the widest possible range, but then when can you eliminate formerly-indeterminate effects? Is having a range greater than a single value enough? And remember, both of these increase the burden on compiler writers considerably, and if this were introduced in a new standards version several compilers would need to disable a lot of code until it could be properly fixed.

          1. -2

            That would be a false argument. Note that the claim is that the contents of the variable are “implicitly” UB. The standard did not explicitly rule this read to be undefined - this is an expansion of UB. In fact it is an expansion of UB into territory explicitly exempted from UB; the exception for character data were intended to cushion the ill effects of previous UB edicts. More importantly, I’d really like to see an example of a non-trivial optimization - something that actually made working code run faster or take up less space using this approach. I can see how many micro-“optimizations” are possible but what are you optimizing when you claim the programmer intention is unknowable?

            “(e.g. if int a = indeterminate & 1;, what can you say about a?)” - that it is either 0 or 1. Indeterminate is an implicit I/O like operation. There is nothing you can do to eliminate indeterminate effects and there is no need to do so. Compiler writers are solving the drunkards lampost problem: it’s easier to do “optimizations” if you knew or could assume something about these values, so let’s assume that we do or can.

            C programmers would be much better served by compilers that did better static analysis than by compilers that produce unexpected program failure that is labeled “optimization”.’

            “ And remember, both of these increase the burden on compiler writers considerably, and if this were introduced in a new standards version several compilers would need to disable a lot of code until it could be properly fixed.”

            Far better than to impose silent failures on working code in e.g. widely used crypto applications.

            1. 15

              Far better than to impose silent failures on working code in e.g. widely used crypto applications.

              If you’re talking about seeding an RNG from stack “garbage” that approach was always crap. It never worked. The garbage on the stack is always the same.

          2. 9

            Because that code relies on explicitly undefined behavior.

        2. 11

          UB discussion aside, please note that x[n] ^= x[n] is not faster than x[n] = 0. Assuming the compiler doesn’t clean up after you, XORing a memory location with itself will take more instructions, more registers, and a memory load/store instead of just a store. This is true on X86 and every other architecture I know of.

          mov [rax+rbx], 0
          

          vs

          mov rcx, [rax+rbx]
          xor [rax+rbx], rcx
          

          The “xor to 0” pattern only makes sense for registers, and then only because of instruction encoding quirks on X86. On architectures with fixed-length instructions there’s no reason for this silliness. And again, your compiler will almost certainly pick the correct mechanism here regardless of whether the code is x=0 or x^=x. Unless, of course, x is indeterminate.

          1. 7

            Valid complaints. I’m basically convinced at this point that an easy way of grinding your PhD is to find some “optimization” and get it patched into GCC, which is good sometimes and unfortunately maybe bad for the ecosystem.

            It would make sense (an inconvenient sense!) to update the standard to say “No, that var is uninitialized, you don’t get to do this. Go use __asm__ or your local equivalent if you think this is a good idea.”

            And yeah, that’d break a lot of things, but that’s okay. It’d force people to fix things.

            But adding silent “optimizations” that quietly make programs start doing the wrong thing is not good. :(

            1. 9

              Which Phd committees care if your idea actually lands in GCC?

              I don’t think any of them do. For better or worse the idea is considered separately from the implementation.

              1. 1

                I could be wrong–my impression (from watching some local CS departments) is that folks studying compilers tend to end up contributing that work upstream.

                My impression may, of course, be incorrect.

                1. 1

                  It depends on the school, team, and individual. Most academics are told to focus on getting as many publications out there as possible. Funding bodies seem to focus on quantity more than quality, too. With that incentive, actually building or polishing for release can get them punished in sense of looking less productive.

                  For example, Anti of Rump Kernel fame told me his advisors or whatever didn’t care that he built it. They just wanted the paper published.

            2. 3

              This is an example of the sort of the UB that can usually be identified by the compiler and dealt with in more appropriate ways. If a compiler optimises it away, that’s not really the fault of the committee - more of the compiler.

              (However it’s not possible to require a diagnostic for this without imposing an actual runtime penalty in at least some cases where the problem doesn’t even occur. No compiler can determine in all cases with certainty that an uninitialised value is or isn’t being used - it would have to solve the halting problem to do that).

              Also it’s odd to say that the committee “continues its effort” to kill C when the rules here haven’t really changed for some time.

              That said, at this stage I really wonder if they’d be better off just specifying that all variables are initialised to a default value (of 0). Then you’d pay a very small cost for cases where the compiler couldn’t determine that the initial value was always overwritten before being used - which probably isn’t common anyway - with the benefit of eliminating one cause of UB.

              1. -1

                The alternative of simply calling the value indeterminate and moving on seems superior. 90% of C programmers are unaware of the expansion of UB driven code rewrites in modern compilers. The effects of these “optimizations” are to make C code more error prone and make it harder to develop and maintain reliable C programs. The language is intrinsically difficult - on purpose - and the cumulative effect of an “optimizing” compiler that seeks out methods of silently sabotaging working code will be to drive people to alternative languages.

                1. 9

                  Most programmers actually are aware of the results, just somewhat tangentially - and worse, they incentivise it.

                  Consider when clang hit the world: for people who didn’t have a strong copyleft ideology seemed to trend towards “use clang for development, it has very nice error messages, etc, but gcc for release: it produces faster and smaller code!”

                  At this point, the race was on where the only thing keeping gcc alive until they could bring in a lot of the niceties from clang was the fact that it was so much faster. At this point, low hanging UB fruit starts becoming very tempting.

                  UB driven optimisations will go away when we stop the toxicity of demanding ever faster benchmarks and start asking increasingly for compilers that aim to be correct-in-practice.

                  1. 1

                    The language is intrinsically difficult - on purpose […]

                    Could you say more about that?

                    1. 2

                      It’s designed as an expert’s language and its initial purpose was to replace most of the assembler in OS code.

                      C is a general-purpose programming language with features economy of expression, modern flow control and data structures, and a rich set of operators. C is not a very high level'' language, nor abig’’ one, and is not specialized to any particular area of application. But its absence of restrictions and its generality make it more convenient and effective for many tasks than supposedly more powerful languages

                      • economy of expression
                      • not high level
                      • absence of restrictions

                      Super easy to do something wrong if you don’t know what you are doing. But as the standard has evolved, it has become easier to get unpredictable behavior from the compiler due to the interaction of optimizing compilers and complex specifications. For example, it’s beyond weird that it is permitted to change the logical operation of code by selecting a higher optimization level from the compiler.