1. 7

    In that example, the programmer should not use memset directly, because it is possible for the platform to define 0.0f as something other than a sequence of zero bytes.

    1. 0

      Good point - which makes the optimization even worse.

      1. 8

        No, it makes the optimization better. Because the compiler would know that and do the right thing.

        1. 6

          No, the compiler knows the representation of 0.0f, so it can perform the optimisation safely.

          1. -2

            maybe it would and maybe the programmer intends to run the same binary on a machine with the same isa but different fp units and knows how he or she wants to represent 0.

            1. 8

              “What is a cross-compiler?”

              1. -3

                great. more binaries so that the compiler can perform an optimization the programmer doesn’t want.

                1. 4

                  That would be a mind-numbingly awful hack. To make that code even work you’d have to have conditionals everywhere checking what FP unit you’re using in order to ever do math correctly on both machines. In which case you could just add the condition around the memset too.

                  1. 1

                    you mean like the universally used gmp library does?

                    1. 2

                      No, it doesn’t. The whole point of gmp is ultra high performance. Branching on every low level operation is NOT high performance. The low level mpn code that everything else is implemented on has a jillion implementations for different ISAs, and multiple implementations within ISAs that have different FP units or other instructions available. It even says so in gmp/mpn/README:

                      A particular compile will only use code from one subdirectory, and the `generic' subdirectory. The ISA-specific subdirectories contain hierarchies of directories for various architecture variants and implementations; the top-most level contains code that runs correctly on all variants.

                      The arm subdirectory alone has TEN compile-time selected variants:

                      • mpn/arm/neon
                      • mpn/arm/v5
                      • mpn/arm/v6
                      • mpn/arm/v6t2
                      • mpn/arm/v7a
                      • mpn/arm/v7a/cora15
                      • mpn/arm/v7a/cora15/neon
                      • mpn/arm/v7a/cora7
                      • mpn/arm/v7a/cora8
                      • mpn/arm/v7a/cora9
                      1. 1

                        and the generic code has ifdefs that depend on, for example, whether the architecture has a divide/remainder operation that can simplify mod … So the programmer uses her knowledge of the machine architectures.

                        for the fp example, i had in mind, you would not need conditionals, just an understanding of the target architectures. C is not java.

                        1. 4

                          You do know that ifdefs are compile time right? As in they require generating multiple binaries.

                          1. 2

                            really?

      1. 4

        Dismissing the extremely simple memset example ignores that there are much more complicated situations that are much more difficult to manually optimize.

        Besides, if you violate strict aliasing, you ought to know what you’re getting in to or not do it at all. The quoted LLVM dev even tells you how to do it without breaking the rules:

        C requires that these sorts of type conversions happen through memcpy

        Also regarding this bit:

        well known controversial results include removing checks for null pointers due to an unreliable compiler inference about dereference behavior

        Where “unreliable complier inference” means “the programmer didn’t account for memory ordering or explicit loads in concurrent code?” I can’t think of any other situation where this would happen. Shared memory concurrency is hard. This is not news. If you’re going to write tricky shared memory concurrent code in C you should know what you’re doing.

        And it’s not only about C. If you don’t correctly address memory use in your code, the processor itself will screw you with hardware level undefined behavior. Processors will move memory accesses around however they want without certain barrier instructions.

        1. 0

          “Besides, if you violate strict aliasing, you ought to know what you’re getting in to or not do it at all. ”

          That’s exactly the point: the C programmer should not need to keep up with the latest compiler optimizations.

          One of the classic problems to solve in C is to translate data structures from a big endian machine to a small endian machine. Apparently if you do this via a character pointer, you won’t run into C “optmizations” but otherwise you might. That’s ridiculous.

          The unreliable inference has been discussed a lot on security forums. There was a code fragment where the compiler decided to silently remove a null pointer check under the assumption that a prior dereference would have failed if the pointer was null. That’s just stupid.

          1. 7

            endian.h exists for a reason.

            1. 5

              There was a code fragment where the compiler decided to silently remove a null pointer check under the assumption that a prior dereference would have failed if the pointer was null. That’s just stupid.

              No that’s not just stupid, that’s just true. Unless you have concurrently externally modified that pointer without appropriate guarding. It has nothing to do with compiler optimizations, if you have a variable that’s going to be concurrently modified and you don’t insert guards, the processor itself could do the wrong thing.

              If people are writing real production C software and they don’t want to care about this sort of thing, they should pick another language. There are plenty of mostly good enough languages that they can be lazy in. I fully believe in developer friendliness but C is like this for a reason.

              the C programmer should not need to keep up with the latest compiler optimizations

              The C programmer absolutely should keep up with the latest compiler optimizations. And they should avoid relying on undefined behavior unless they have a good reason to, and are sure it will work with their compiler. It’s not that hard, compilers throw warnings for strict aliasing violations and other undefined behavior issues. When they don’t, static code analysis does. If you don’t compile with -Wall and you don’t use clang-analyzer or something similar, that’s your problem and you deserve what you get.

              It’s not like these are hokey fringe techniques, someone just today posted a diff to openbsd-tech with fixes found using static code analysis.

              1. 2

                this has nothing to do with concurrent modifications. The code contained an error that the compiler made worse

                there was an explicit check: if(p == null) complain. The compiler saw a higher deference of p, which actually did not fail, and concluded on one pass that it could delete the null check code. https://lwn.net/Articles/342330/

                If only -Wall warned programmers of all UB, then there would be no problem.

                1. 5

                  Interesting article, I liked it a lot. However I’d like to point out:

                  Several things went wrong to make this exploit possible: security modules were able to grant access to low memory mappings contrary to system policy, the SELinux policy allowed those mappings, pulseaudio can be exploited to make a specific privileged operation available to exploit code, a NULL pointer was dereferenced before being checked, the check was optimized out by the compiler…

                  Out of these steps to exploit, the compiler optimization is hardly the worst. Seriously, 0x0 page memory mapping? That’s comically bad. And probably the only other way to make this optimization problematic.

                  If you’re actually using an embedded system or something that doesn’t error when NULL is dereferenced, just compile with -fno-delete-null-pointer-checks. Done.

                  1. 5

                    The dereference of the null can trap on some architectures. The compiler is permitted to reorder the memory access (you would rule out a lot of optimizations if you disallowed that) so if tun is in fact null that could could trap before, during, or after any of the rest of the function. What requirements would you have the standard impose on code like that? You talk about “warning programmers of all UB”, but the UB here was that the code dereferenced a pointer - would you like all pointer dereferences to generate warnings? Do you want to impose a requirement that the compiler does some kind of flow analysis that keeps track of when a pointer is dereferenced and then checked for null somewhere else in the program? (Good luck formalizing that as a standard).

                    1. 3

                      The warning would be “redundant check for null” . Removing the check is not a safe optimization without at least a warning. Actually, it should have warned on the derefrence anyways, which would be a lot more useful than the so-called optimization. If you are going to be able to do sophisticated flow analysis in the compiler you should share information about possible bugs with the programmer.

                      As for reordering - the logical ordering of program code has to be respected otherwise if(p != null) x = *p would be more of a guideline than a rule in any C code since the compiler could derefence p before

                      Reorder is complex, but not so complex as to allow skipping null tests.

                      1. 2

                        The warning would be “redundant check for null”

                        Hmm. You could warn when removing a null check, maybe, but I think that would still be a prohibitively large number of warnings - there’ll be a lot of code that defensively checks for null, and consider when a macro gets expanded or a function gets inlined.

                        If you are going to be able to do sophisticated flow analysis in the compiler you should share information about possible bugs with the programmer.

                        As the article points out, it may not have been sophisticated - two quite simple optimization passes running one after another could have resulted in that behaviour.

                        As for reordering - the logical ordering of program code has to be respected otherwise if(p != null) x = *p would be more of a guideline than a rule in any C code since the compiler could derefence p before

                        The logical ordering of null dereferences does not have to be respected, since they’re not allowed - it’s the programmer’s responsibility to not dereference null. If you changed that rule you would make it basically impossible for the compiler to do any reordering on architectures on which null dereference traps.

                        1. 1

                          Imagine you had a database system where a query optimization replaced “if account.funds >= request then subtract request” because of an optimizer that in a previous step incorrectly assumed that the check was not necessary. And then imagine that the developers defended this by demanding that database users keep up with poorly or undocumented changes in the optimizer.

                          1. 2

                            There was nothing incorrect about assuming the check was not necessary, and it is well documented that dereferencing a null pointer is undefined behaviour. The surrounding system should have already aborted at the null dereference (and would have had it not been for the flaw that allowed the zero-page mapping… why am I not surprised that a Pottering component is involved?)

            1. 5

              Uh, since when is it acceptable for memmove to simply copy in the opposite direction? My understanding is that it must work no matter which part of the data is overlapping, as if it made a temporary copy somewhere else entirely. Simply reversing the copy direction cannot guarantee this… memmove(&s[1], &s[3], 3);

              1. 4

                I think he oversimplified a bit there. You at any rate sometimes need to copy downwards, and sometimes need to copy upwards, so having the direction flag smashed by signals kills you either way, but I agree that his comment as-phrased seems very weird.

                1. 1
                  memmove(&s[1], &s[3], 3);
                  

                  simplified is

                  s[1] = s[3];
                  s[2] = s[4];
                  s[3] = s[5];
                  
                  1. 2

                    That would be memcpy (although it also makes no guarantee on the order/direction)

                    1. 2

                      What’s your point? memmove() is just an overlap-safe memcpy(), trivially implemented with variable direction and atomic width.

                  2. 1

                    Wouldn’t copying in reverse work just fine for that? Copy 3 to 5, then 2 to 4, and finally 1 to 3?

                    EDIT: Whoops, mixed up the source and dest.