1. 15
  1.  

  2. 16

    Some of the arguments made in here seem to overlook that C is from a different era, and that Undefined Behavior is not a category that exists for performance optimization but instead out of neccessity.

    For example:

    we mucked up the standard and we are going to cause many systems to fail as these nuanced rules confuse and surprise otherwise careful and highly expert programmers

    Here’s the thing, C has strict aliasing rules because some of the older architectures out there did not store addresses of different kinds of memory in the same way. So, ISO C does not require that float * and int * have the same (or even a compatible) representation. As a result, C was able to support those older platforms. And yes, that means that arbitrary casting is invalid. The exceptions actually are also pretty sensible:

    • You can cast anything to char * (so that you can inspect any type byte-wise)
    • Unions allow you to functionally cast anything to anything else (so long as you only use the union for the aliasing)—however, you can run into alignment issues
    • To avoid all incompatibilities (representation or alignment), you can fall back to memcpy()
    • Vectors and atomics do not fit into the above three items all the time because their semantics are so radically different it would be difficult to even give meaningful results

    Is this surprising to someone who hasn’t read the specification? Almost certainly (I was definitely surprised by strict aliasing rules when I first learned of them). Is this a fault of the C committee because they’re bad at their jobs? No; this is the product of C being from an era when x86 wasn’t the ruler of architectures and when portability (without sacrificing all performance as compared to asm) was one of the most important goals.

    Finally, a strict reading of the Spec would make plain that “Undefined Behavior” is explicitly not something that the programmer ever invokes. Rather, if you have attempted to do something in your code that “triggers” UB, then your code is not valid C. I’ve heard people argue that it would be better if ISO C required that compilers detect this and print a warning or error at compile-time. While that sounds nice, it can be really hard to detect some cases of UB automatically, so it cannot be done universally.

    There are definitely some cases of UB that I deeply disapprove of (file does not end with a newline? unmatched quote on a logical line of source code?), but since the category cannot be eliminated wholesale (since some cases of UB are categorized as such because there is nothing that the compiler could even do that would be useful) and since it is unreasonable and unfeasible for the spec to require that the compiler always print a diagnostic (warning/error) for behaviors in this category at compile-time, I do not see many other options. Your code is not valid C, the compiler can do whatever it wants.

    1. 6

      There are two problems but the fundamental problem is the idea that “undefined behavior” is a license for the compiler to arbitrarily transform code. Proposing that you can remove a null check because you assume a prior dereference (ub) would have failed is irresponsible. C undefined behavior meant “we don’t swear to what it does on this architecture”, but the compiler writers/standards developers have treated this as: a correctly behaving compiler can do anything it wants when it finds UB. That was certainly not the original idea and the fact that expert C programmers have been repeatedly caught out by these changes means that the standards committee broke standard usage. If that had been done for a compelling reason, you could make the argument for it, but the optimization argument is hooey.

      When there is nothing useful the compiler can do, it can just generate the code the programmer has asked for and move on. The first ANSI standard explicitly permits pointer conversion and even converting integers to pointers. Th programmer is expected to beware of what the processor architecture might do, but now she must be alert to some “optimization” that could do anything at all.

      “ Your code is not valid C, the compiler can do whatever it wants.”

      Absurd. If the compiler does not know how to compile it, it should refuse to compile it.

      “ Rather, if you have attempted to do something in your code that “triggers” UB, then your code is not valid C.”

      This is the interpretation in the standard, but it’s wrong. C permits programmers to do things that are undefined. For example division without checking for zero value may fail in some machine specific way- but that’s not the responsibility of the compiler and certainly not a license to replace the division with arbitrary code.

      The null pointer example is a great one. The original C book says null pointers are not valid, but many UNIX like kernels and other OS code have mapped page zero so that there is data at address zero. The standards committee should have adapted C to that practice instead silently breaking widely used and often critical code.

      “Here’s the thing, C has strict aliasing rules because some of the older architectures out there did not store addresses of different kinds of memory in the same way. So, ISO C does not require that float * and int * have the same (or even a compatible) representation. As a result, C was able to support those older platforms. And yes, that means that arbitrary casting is invalid.”

      This is not a new development - what’s new is that “arbitrary casting is invalid”. The intent was that the language and compiler do not promise to hide the details of the processor architecture but the programmer was certainly permitted to make use of her knowledge of particular representation of floating point. The premise of C is that these abstractions are not black boxes.

      1. 4

        C permits programmers to do things that are undefined. For example division without checking for zero value may fail in some machine specific way- but that’s not the responsibility of the compiler and certainly not a license to replace the division with arbitrary code.

        It’s a license for the compiler to do effectively anything if the divisor is zero. Division by zero might trap. The compiler is permitted to reorder operations (and disallowing that would inhibit a huge amount of optimizations). So code that does a division by zero might arbitrarily fail at any point on various machines, and cross-platform code (which is the only code the standard concerns itself about) must not perform divisions by zero.

        The null pointer example is a great one. The original C book says null pointers are not valid, but many UNIX like kernels and other OS code have mapped page zero so that there is data at address zero. The standards committee should have adapted C to that practice instead silently breaking widely used and often critical code.

        What would you have the standard say? The standard could e.g. require that a null pointer dereference evaluate to an implementation-defined value, but this would be costly on many architectures and mask bugs. (Indeed if the standard were going to require anything I would rather they went the other way and required null pointer dereferences to abort() immediately).

        The intent was that the language and compiler do not promise to hide the details of the processor architecture but the programmer was certainly permitted to make use of her knowledge of particular representation of floating point. The premise of C is that these abstractions are not black boxes.

        If you’re not writing cross-platform code then why use C rather than a platform-specific assembler? And certainly why concern yourself with the details of the C standard when you’re only using one implementation?

        1. 2

          “It’s a license for the compiler to do effectively anything if the divisor is zero”

          Anything? If I write if(pressure > MAX){ openvalve(); print(“ratio of pressure to temp = %d\n”, pressure/temp);} and the compiler detects that sometimes temp may hold zero, can it delete the whole block and silently produce code that does a no-op as an optimization?

          The standard could say: derefencing a null pointer has an effect undefined by the C language that depends on the semantics of the address space in which your code executes. It might also offer to warn or error on any detected null as an option.

          “If you’re not writing cross-platform code then why use C rather than a platform-specific assembler? And certainly why concern yourself with the details of the C standard when you’re only using one implementation?”

          There are literally millions of lines of C code for embedded systems, OS’s , drivers, numeric codes (which open up fp numbers) like lib gmp, vector specific operations that are architecture dependent …

          1. 4

            Anything? If I write if(pressure > MAX){ openvalve(); print(“ratio of pressure to temp = %d\n”, pressure/temp);} and the compiler detects that sometimes temp may hold zero, can it delete the whole block and silently produce code that does a no-op as an optimization?

            It can no-op in the cases where the temperature is zero. AIUI the C standard is defined per-execution - a program whose execution would be undefined on some inputs is still defined on inputs on which it is defined.

            (As a reader used to an architecture on which division by zero traps and an architecture which reorders stores, I would say your code is unsafe on its face and the compiler is not being unreasonable here)

            The standard could say: derefencing a null pointer has an effect undefined by the C language that depends on the semantics of the address space in which your code executes. It might also offer to warn or error on any detected null as an option.

            What would be the meaningful difference between that and what it currently says?

            1. 2

              The interpretation of GCC/Clang developers has been that e.g.

              x = p->d; if(p == NULL)abort()

              can be replaced by

              nop

              That was the security bug that made this issue flare up a few years back. The compiler detects the dereference, which did not cause a trap in the application because the 0 page was mapped in, then decided that the test was redundant and could be optimized away.

              1. 1

                Yes. I’d say the compiler is right, and the architecture that allowed the zero page mapping is unfortunate. The architecture is already permitting the program to keep running when it doesn’t make sense; if the null check had been preserved that would have solved it in this one instance, but developers writing compensating code to handle things after their program has already become broken is not the right way around to be doing anything.

    2. 7

      In that example, the programmer should not use memset directly, because it is possible for the platform to define 0.0f as something other than a sequence of zero bytes.

      1. 0

        Good point - which makes the optimization even worse.

        1. 8

          No, it makes the optimization better. Because the compiler would know that and do the right thing.

          1. 6

            No, the compiler knows the representation of 0.0f, so it can perform the optimisation safely.

            1. -2

              maybe it would and maybe the programmer intends to run the same binary on a machine with the same isa but different fp units and knows how he or she wants to represent 0.

              1. 8

                “What is a cross-compiler?”

                1. -3

                  great. more binaries so that the compiler can perform an optimization the programmer doesn’t want.

                  1. 4

                    That would be a mind-numbingly awful hack. To make that code even work you’d have to have conditionals everywhere checking what FP unit you’re using in order to ever do math correctly on both machines. In which case you could just add the condition around the memset too.

                    1. 1

                      you mean like the universally used gmp library does?

                      1. 2

                        No, it doesn’t. The whole point of gmp is ultra high performance. Branching on every low level operation is NOT high performance. The low level mpn code that everything else is implemented on has a jillion implementations for different ISAs, and multiple implementations within ISAs that have different FP units or other instructions available. It even says so in gmp/mpn/README:

                        A particular compile will only use code from one subdirectory, and the `generic' subdirectory. The ISA-specific subdirectories contain hierarchies of directories for various architecture variants and implementations; the top-most level contains code that runs correctly on all variants.

                        The arm subdirectory alone has TEN compile-time selected variants:

                        • mpn/arm/neon
                        • mpn/arm/v5
                        • mpn/arm/v6
                        • mpn/arm/v6t2
                        • mpn/arm/v7a
                        • mpn/arm/v7a/cora15
                        • mpn/arm/v7a/cora15/neon
                        • mpn/arm/v7a/cora7
                        • mpn/arm/v7a/cora8
                        • mpn/arm/v7a/cora9
                        1. 1

                          and the generic code has ifdefs that depend on, for example, whether the architecture has a divide/remainder operation that can simplify mod … So the programmer uses her knowledge of the machine architectures.

                          for the fp example, i had in mind, you would not need conditionals, just an understanding of the target architectures. C is not java.

                          1. 4

                            You do know that ifdefs are compile time right? As in they require generating multiple binaries.

                            1. 2

                              really?

        2. 4

          Dismissing the extremely simple memset example ignores that there are much more complicated situations that are much more difficult to manually optimize.

          Besides, if you violate strict aliasing, you ought to know what you’re getting in to or not do it at all. The quoted LLVM dev even tells you how to do it without breaking the rules:

          C requires that these sorts of type conversions happen through memcpy

          Also regarding this bit:

          well known controversial results include removing checks for null pointers due to an unreliable compiler inference about dereference behavior

          Where “unreliable complier inference” means “the programmer didn’t account for memory ordering or explicit loads in concurrent code?” I can’t think of any other situation where this would happen. Shared memory concurrency is hard. This is not news. If you’re going to write tricky shared memory concurrent code in C you should know what you’re doing.

          And it’s not only about C. If you don’t correctly address memory use in your code, the processor itself will screw you with hardware level undefined behavior. Processors will move memory accesses around however they want without certain barrier instructions.

          1. 0

            “Besides, if you violate strict aliasing, you ought to know what you’re getting in to or not do it at all. ”

            That’s exactly the point: the C programmer should not need to keep up with the latest compiler optimizations.

            One of the classic problems to solve in C is to translate data structures from a big endian machine to a small endian machine. Apparently if you do this via a character pointer, you won’t run into C “optmizations” but otherwise you might. That’s ridiculous.

            The unreliable inference has been discussed a lot on security forums. There was a code fragment where the compiler decided to silently remove a null pointer check under the assumption that a prior dereference would have failed if the pointer was null. That’s just stupid.

            1. 7

              endian.h exists for a reason.

              1. 5

                There was a code fragment where the compiler decided to silently remove a null pointer check under the assumption that a prior dereference would have failed if the pointer was null. That’s just stupid.

                No that’s not just stupid, that’s just true. Unless you have concurrently externally modified that pointer without appropriate guarding. It has nothing to do with compiler optimizations, if you have a variable that’s going to be concurrently modified and you don’t insert guards, the processor itself could do the wrong thing.

                If people are writing real production C software and they don’t want to care about this sort of thing, they should pick another language. There are plenty of mostly good enough languages that they can be lazy in. I fully believe in developer friendliness but C is like this for a reason.

                the C programmer should not need to keep up with the latest compiler optimizations

                The C programmer absolutely should keep up with the latest compiler optimizations. And they should avoid relying on undefined behavior unless they have a good reason to, and are sure it will work with their compiler. It’s not that hard, compilers throw warnings for strict aliasing violations and other undefined behavior issues. When they don’t, static code analysis does. If you don’t compile with -Wall and you don’t use clang-analyzer or something similar, that’s your problem and you deserve what you get.

                It’s not like these are hokey fringe techniques, someone just today posted a diff to openbsd-tech with fixes found using static code analysis.

                1. 2

                  this has nothing to do with concurrent modifications. The code contained an error that the compiler made worse

                  there was an explicit check: if(p == null) complain. The compiler saw a higher deference of p, which actually did not fail, and concluded on one pass that it could delete the null check code. https://lwn.net/Articles/342330/

                  If only -Wall warned programmers of all UB, then there would be no problem.

                  1. 5

                    Interesting article, I liked it a lot. However I’d like to point out:

                    Several things went wrong to make this exploit possible: security modules were able to grant access to low memory mappings contrary to system policy, the SELinux policy allowed those mappings, pulseaudio can be exploited to make a specific privileged operation available to exploit code, a NULL pointer was dereferenced before being checked, the check was optimized out by the compiler…

                    Out of these steps to exploit, the compiler optimization is hardly the worst. Seriously, 0x0 page memory mapping? That’s comically bad. And probably the only other way to make this optimization problematic.

                    If you’re actually using an embedded system or something that doesn’t error when NULL is dereferenced, just compile with -fno-delete-null-pointer-checks. Done.

                    1. 5

                      The dereference of the null can trap on some architectures. The compiler is permitted to reorder the memory access (you would rule out a lot of optimizations if you disallowed that) so if tun is in fact null that could could trap before, during, or after any of the rest of the function. What requirements would you have the standard impose on code like that? You talk about “warning programmers of all UB”, but the UB here was that the code dereferenced a pointer - would you like all pointer dereferences to generate warnings? Do you want to impose a requirement that the compiler does some kind of flow analysis that keeps track of when a pointer is dereferenced and then checked for null somewhere else in the program? (Good luck formalizing that as a standard).

                      1. 3

                        The warning would be “redundant check for null” . Removing the check is not a safe optimization without at least a warning. Actually, it should have warned on the derefrence anyways, which would be a lot more useful than the so-called optimization. If you are going to be able to do sophisticated flow analysis in the compiler you should share information about possible bugs with the programmer.

                        As for reordering - the logical ordering of program code has to be respected otherwise if(p != null) x = *p would be more of a guideline than a rule in any C code since the compiler could derefence p before

                        Reorder is complex, but not so complex as to allow skipping null tests.

                        1. 2

                          The warning would be “redundant check for null”

                          Hmm. You could warn when removing a null check, maybe, but I think that would still be a prohibitively large number of warnings - there’ll be a lot of code that defensively checks for null, and consider when a macro gets expanded or a function gets inlined.

                          If you are going to be able to do sophisticated flow analysis in the compiler you should share information about possible bugs with the programmer.

                          As the article points out, it may not have been sophisticated - two quite simple optimization passes running one after another could have resulted in that behaviour.

                          As for reordering - the logical ordering of program code has to be respected otherwise if(p != null) x = *p would be more of a guideline than a rule in any C code since the compiler could derefence p before

                          The logical ordering of null dereferences does not have to be respected, since they’re not allowed - it’s the programmer’s responsibility to not dereference null. If you changed that rule you would make it basically impossible for the compiler to do any reordering on architectures on which null dereference traps.

                          1. 1

                            Imagine you had a database system where a query optimization replaced “if account.funds >= request then subtract request” because of an optimizer that in a previous step incorrectly assumed that the check was not necessary. And then imagine that the developers defended this by demanding that database users keep up with poorly or undocumented changes in the optimizer.

                            1. 2

                              There was nothing incorrect about assuming the check was not necessary, and it is well documented that dereferencing a null pointer is undefined behaviour. The surrounding system should have already aborted at the null dereference (and would have had it not been for the flaw that allowed the zero-page mapping… why am I not surprised that a Pottering component is involved?)